Dekejit/spec.md

3.7 KiB

Disclaimer

This is a fantasy architecture on which I intend to write fantasy compilers. It was born out of the "fuck around and find out" philosophy, and is a toy project. I will change a lot of stuff as I learn how it's done in the real world. For now, I'm just gonna guess and have fun.

Since I'm studying riscV, this will be a lot riscv inspired.

The GRAVEJIT virtual machine

The gravejit virtual machine sports 16 16-bit registers (plus the program counter!) and 16 operations.

Here is the list of registers togheter with memonics.

0 : zero // register 0 is always 0. 1 : ra // return address 2 : sp // stack pointer 3 : t0 // temporary 4 : t1 5 : t2 6 : t3 7 : a0 // function arguments 8 : a1 9 : a2 10: a3 11: s0 // saved registers 12: s1 13: s2 14: s3 15: t4 // don't know what to do with this

pc: program counter.

ISA

opcode | memonic | format | description

0000 | NOP | just 0s'| Does nothing. 0001 | ADD s0 s1 s2 | R | s0 = s1 + s2 0010 | SUB s0 s1 s2 | R | s0 = s1 - s2 0011 | AND s0 s1 s2 | R | s0 = s1 && s2 0100 | XOR s0 s1 s2 | R | s0 = s1 xor s2 0101 | SLL s0 s1 s2 | R | s0 = s1 << s2 0110 | SLI s0 c | I | s0 = s0 << c 0111 | ADDI s0 c | I | s0 = s0 + c 1000 | BEQ s0 s1 s2 | R | if (s1 == s2) -> pc = s0 1001 | BGT s0 s1 s2 | R | if (s1 > s2) -> pc = s0 1010 | JAL s0 s1 c | J | s0 = pc+1; pc += s1 + c; 1011 | 1100 | LOAD s0 s1 s2 | R | loads s1 + shift by s2 in s0 1101 | STORE s0 s1 s2| R | stores s0 in address s1 + shift by s2 1110 | CALL s0 c | I | performs system call 1111 | HALT | just 1s'| halt, and possibly catch fire.

Operation formats:

Each istruction is 16 bits long. The first 4 most-significant bits are the opcode. Constants (c in the above table) are always considered signed, and written in two's compliment. Sign extension also takes place whenever needed. i.e., to make an immediate subtraction, one just needs to add a negative number.

R-type:

opcode: 4 bits dest register: 4 bits source 1 register: 4 bits source 2 register: 4 bits

example: ADD s0 s1 s2 = 0001 1011 1100 1101

I-type

opcode: 4 bits dest register: 4 bits constant: 8 bits

example: ADDI s0 28 = 0111 1011 00011100 ADDI s0 -2 = 0111 1011 11111110

J-Type

opcode: 4 bits dest register: 4 bits jump address register: 4 bits constant: 4 bits

The constant is added to the value of the second register argument.

JIT's system calls:

the CALL instruction is a bit of a hack because I want to load more functionality into the thing. The JIT can decide what to do with the register s0 and the number c. It should be possible to open files, write files, read stdin, write to stdout, etc...

io_vec: first systemcall environment

Working on this, quick and dirty.

Binary executable format:

Binary files start with two 16 bit numbers, a constant and a length N, followed by a list of length N of pairs 16 bit numbers. This is the header of the file.

The initial constant is currently unused and unimportant. In this draft-toy-spec, the initial constant is always 39979.

The first number is an offset, and the second number is a size N in bytes.

The offset points at a null-terminated UTF-8 (yes.) string, located offset*16 bits to the right after the end of the header in the binary file, followed by arbitrary binary content of size N*16 bits.

The utf-8 string cannot contain the null character anywhere, as that will be used as terminator.

This represents a "symbols table" of the binary file, where functions and data can be stored.

There must exist a symbol named "main", and it must point to a function: this will be the entrypoint to our program.