dekejit/spec.md

4.6 KiB

Disclaimer

This is a fantasy architecture on which I intend to write fantasy compilers. It was born out of the "fuck around and find out" philosophy, and is a toy project. I will change a lot of stuff as I learn how it's done in the real world. For now, I'm just gonna guess and have fun.

Since I'm studying riscV, this will be a lot riscv inspired.

The GRAVEJIT virtual machine

The gravejit virtual machine sports 16 16-bit registers (plus the program counter!) and 16 operations.

Here is the list of registers together with memonics.

0 : zero // register 0 is always 0.
1 : ra   // return address
2 : sp   // stack pointer
3 : t0   // temporary 
4 : t1
5 : t2
6 : t3
7 : a0   // function arguments
8 : a1
9 : a2
10: a3
11: s0   // saved registers
12: s1
13: s2
14: s3
15: t4 // don't know what to do with this

pc: program counter.

ISA

opcode memonic format description
0000 NOP just 0s' Does nothing.
0001 ADD s0 s1 s2 R s0 = s1 + s2
0010 SUB s0 s1 s2 R s0 = s1 - s2
0011 AND s0 s1 s2 R s0 = s1 && s2
0100 XOR s0 s1 s2 R s0 = s1 xor s2
0101 SLL s0 s1 s2 R s0 = s1 << s2
0110 SLI s0 c I s0 = s0 << c
0111 ADDI s0 c I s0 = s0 + c
1000 BEQ s0 s1 s2 R if (s1 == s2) -> pc = s0
1001 BGT s0 s1 s2 R if (s1 > s2) -> pc = s0
1010 JAL s0 s1 c J s0 = pc+1; pc += s1 + c;
1011 #TODO?
1100 LOAD s0 s1 s2 R loads s1 + shift by s2 in s0
1101 STORE s0 s1 s2 R stores s0 in address s1 + shift by s2
1110 CALL s0 c I performs system call
1111 HALT just 1s' halt, and possibly catch fire.

Operation formats:

Each instruction is 16 bits long. The first 4 most-significant bits are the opcode. Constants (c in the above table) are always considered signed, and written in two's compliment. Sign extension also takes place whenever needed. i.e., to make an immediate subtraction, one just needs to add a negative number.

R-type:

opcode: 4 bits dest register: 4 bits source 1 register: 4 bits source 2 register: 4 bits

example: ADD s0 s1 s2 = 0001 1011 1100 1101

I-type

opcode: 4 bits dest register: 4 bits constant: 8 bits

example: ADDI s0 28 = 0111 1011 00011100 ADDI s0 -2 = 0111 1011 11111110

J-Type

opcode: 4 bits dest register: 4 bits jump address register: 4 bits constant: 4 bits

The constant is added to the value of the second register argument.

JIT's system calls:

What the CALL instruction does is up to implementations. The JIT can decide what to do with the register s0 and the number c. It could provide mechanisms to perform I/O on a true filesystem, on an emulated filesystem, or it could do something else entirely, i.e, something web related.

io_vec: first systemcall environment

Working on this, quick and dirty.

Binary executable format:

Binary files start with two 16 bit numbers, a constant and a length N, followed by a list of length N of pairs 16 bit numbers. This is the header of the file.

The initial constant is currently unused and unimportant. In this draft-toy-spec, the initial constant is always 39979.

The first number is an offset, and the second number is a size N in bytes.

The offset points at a null-terminated UTF-8 (yes.) string, located offset*16 bits to the right after the end of the header in the binary file, followed by arbitrary binary content of size N*16 bits.

The utf-8 string cannot contain the null character anywhere, as that will be used as terminator.

This represents a "symbols table" of the binary file, where functions and data can be stored.

There must exist a symbol named "main", and it must point to a function: this will be the entrypoint to our program.

Executable, in memory.

When loading a binary program, all the code in the binary file is placed at the start of our memory, followed by the data sections, in the order it appeared.

The "text" sections (or code) are put in the order they appeared on the binary file, with the only exception of the "main" section, witch goes at the start of the file.