Raptor64
Raptor64 is a 64-bit multi-context RISC cpu that
supports hyper-threading. There are 16 register sets that the processor
automatically switches between at high speed. The processor is fully pipelined
with a six-stage pipeline. Stages: IF/RF/EX/M1/M2/WB. Communication
with memory is via a 64 bit WISHBONE bus. The processor has a 16kB instruction cache
and 32kB data cache.
The processor uses 32 bit instructions.
I've created two versions of the processor a non-hyper-threaded version (sc)
in addition to the hyper-threaded multi-context(mc) one.
- 32 entry 64 bit general register file
- 32 bit opcodes (4 per 128 bits)
- SQRT,Multiply/Divide/bit field/ + all the regulars
- conditional move, exec,
- explicit I/O instructions ( also useful for uncached access)
- immediate constants may be built using SETLO,SETMID,SETHI instructions
- two address modes, displacement (d15[ra]) and scaled indexed
(d2[ra+rb*scale])
- 16 segmentation registers
- SimpleMMU - 32 tasks supported with mapping of 128MB space into 256kB pages
- 64 single bit semaphores
- 16kiB instruction cache, 32kiB data cache
- single cycle execution of most instructions (loads stall the pipeline)
- branch prediction with a 256 entry branch history table
- return address stack prediction
- internal Harvard architecture
- communicates externally using a 64-bit WISHBONE bus
In the works is currently a high level language compiler
for a language similar to 'C'. Several additional keywords have been added (eg.
interrupt). Well I finally fed the output of the compiler through the
assembler. A couple of bug fixes later the sieve is able to run from SD Card.
There is also an assembler (also a work in progress).
Tiny Basic is available in the boot rom. Works with a few bugs yet.
Currently the processor is running code in an FPGA. The
bootrom is slowly expanding. Numerous software and processor fixes have taken
place. Still a long way to go. The processor is being revamped to use a 32 bit
ISA, it was originally a 42 bit ISA.
The core is running on an Atlys board, and now able to load a boot program
from an SD Card. Hopefully that will speed the software development up. Prior,
the only software was updated by updating a Verilog source file, requiring the
entire system to be rebuilt for a software update.
The ISA is still under constant review; it may change to use an 8-bit master
opcode field as opposed to 7-bits. There's lots of instructions I'd like to
add, and no room with only 7 bits.
Raptor64.zip (download not working yet)
The Raptor64 has 32 general purpose registers, although four registers have special uses. R0 always reads as the value zero. R31 is the subroutine link register (LR). The call instruction automatically updates this register with the return address of a subroutine. This register is also used implicitly by the return instruction. R30 is the stack pointer (SP) register. The return instruction automatically updates this register. R29 references the program counter for the instruction and may be used to form program counter relative addresses. R29 is a read-only register.
Register | Usage | |
r0 | zero register; always zero | hardware defined |
r1 | subroutine return value | software convention |
r2 | subroutine return value | software convention |
r3 | temporary | software convention |
r4 | temporary | software convention |
r4 | temporary | software convention |
r6 | temporary | software convention |
r7 | temporary | software convention |
r8 | temporary | software convention |
r9 | temporary | software convention |
r10 | temporary | software convention |
r11 | register variable | software convention |
r12 | register variable | software convention |
r13 | register variable | software convention |
r14 | register variable | software convention |
r15 | register variable | software convention |
r16 | register variable | software convention |
r17 | register variable | software convention |
r18 | register variable | software convention |
r19 | software convention | |
r20 | software convention | |
r21 | software convention | |
r22 | software convention | |
r23 | software convention | |
r24 | constant builder | software convention |
r25 | software convention | |
r26 | software convention | |
r27 | exception address | software convention |
r28 | base pointer | software convention |
r29 | program counter | hardware defined |
r30/sp | stack pointer | hardware defined |
r31/lr | return address/link register | hardware defined |
Raptor64 has 16 segment registers. For data addresses the segment register is chosen by the most significant four bits of the address. For example an address like 0xE000000000000010 uses segment register #14 because the upper nibble of the address is an 'E'. Note that segmentation does not apply to I/O addresses. I/O instructions 'in' and 'out' do not use segmentation; only load and store instructions use it.
For code addresses segment register #15 is always used unless the upper nibble of an address is 'F' in which case segmentation is ignored. This allows the operating system code located in the memory region 0xFxxxxxxxxxxxxxxx to run without paying attention to segmentation. An alternate name for segment register #15 is the CS (code segment) register.
There are four instructions supporting segment registers. Mtseg - move to segment register, mtsegi - move to segment register indirect, mfseg - move from segment register and mfsegi - move from segment register indirect.
The segment register is added (without a shift) to the effective address to form a final segmented address. Some of the low order bits of the segment register are always zero.
The execution pattern table is a 256 entry table that contains context id's. The processor periodically cycles through the execution pattern table to determine which register set context to use. For the non-hyper threading CPU the iepp instruction is used to cycle through the table. This instruction will typically be called from a timer interrupt service routine. The hyper-threading CPU automatically cycles through the execution pattern table. The execution pattern table can be used to control the frequency with which particular contexts are executed. Higher priority contexts can be given more slots in the execution pattern table which will result in the context being executed more frequently. Note that slot #0 of the 256 slots is always zero. This means that context #0 is guarenteed to always execute. Additionally, the execution pattern table is initialized to zero on reset, meaning that context zero is the only executing context. The execution pattern table is updated and read using the mfep - move from execution pattern, and mtep - move to execution pattern table instructions.
The SimpleMMU provides simple memory management capabilities for the Raptor64 CPU. Memory management by the SimpleMMU includes virtual to physical address mapping. The SimpleMMU divides a 128MB memory space up into 512, 256kB pages and supports 32 tasks. Processor address bits 18 through 26 (the virtual address) are used as a nine bit index into a map table to find the physical address page. The MMU remaps the nine address bits into a 10 bit value used as address bits 18 to 27 when accessing a physical address. The lower eighteen bits of an address pass through the MMU unchanged. Also passing through the MMU unchanged are address bits 28 to 63. It is assumed that in the system where the Simple MMU would be relevant, that some or all of the high order bits of an address would be left unconnected. I/O accesses are not mapped by the SimpleMMU and I/O addresses pass through the MMU unchanged.
The mapping table for memory management is stored directly in the SimpleMMU rather than being stored in main memory as is commonly done. The SimpleMMU directly supports up to 32 tasks. Each task has its own mapping table. The mapping table for only a single task is accessible at one time. Mapping table access is controlled by an access key. Eight MMU’s may be used in a system to allow up to 256 tasks.
Access to the mapping table is controlled by an access key. The access key contains the task number for the mapping table to be accessed. The mapping table for only a single task may be accessed at one time. In order to access a map table for another task, the access key must be updated with the desired task number. The upper three bits of the access key identify the MMU to be updated or read from. The lower five bits of the access key identify the map table.
The operate key controls which map table (which task) is currently mapping addresses. The upper three bits of the operate key identify the MMU actively mapping addresses. The lower five bits select the map table within an MMU.
The key value register is used to identify the MMU and is how the MMU’s are differentiated. Operations on the MMU data are only possible if the key value register matches the high order three bits of the access key.
The register set for the MMU is based at I/O address of $DC4000. The mapping table appears as a set of 1024 consecutive I/O locations. All mmu’s share a common register set occupying the same I/O address range, with the exception of the key value register. Access to a particular mmu is controlled by the top three bits of the access key.
Reg |
D15 |
D14 |
D13 |
D12 |
D11 |
D10 |
D9 |
D8 |
D7 |
D6 |
D5 |
D4 |
D3 |
D2 |
D1 |
D0 |
|
|
00 |
WP |
|
|
|
|
|
PA27 |
PA26 |
PA25 |
PA24 |
PA23 |
PA22 |
PA21 |
PA20 |
PA19 |
PA18 |
512 map entries per task |
|
02 |
WP |
|
|
|
|
|
PA27 |
PA26 |
PA25 |
PA24 |
PA23 |
PA22 |
PA21 |
PA20 |
PA19 |
PA18 |
||
04 |
WP |
|
|
|
|
|
PA27 |
PA26 |
PA25 |
PA24 |
PA23 |
PA22 |
PA21 |
PA20 |
PA19 |
PA18 |
||
|
… |
|||||||||||||||||
3FE |
WP |
|
|
|
|
|
PA27 |
PA26 |
PA25 |
PA24 |
PA23 |
PA22 |
PA21 |
PA20 |
PA19 |
PA18 |
||
400 |
|
|
|
|
|
|
|
|
|
|
|
KV MMU0 |
Only one register per MMU |
|||||
402 |
|
|
|
|
|
|
|
|
|
|
|
KV MMU1 |
||||||
404 |
|
|
|
|
|
|
|
|
|
|
|
KV MMU2 |
||||||
406 |
|
|
|
|
|
|
|
|
|
|
|
KV MMU3 |
||||||
408 |
|
|
|
|
|
|
|
|
|
|
|
KV MMU4 |
||||||
40A |
|
|
|
|
|
|
|
|
|
|
|
KV MMU5 |
||||||
40C |
|
|
|
|
|
|
|
|
|
|
|
KV MMU6 |
||||||
40E |
|
|
|
|
|
|
|
|
|
|
|
KV MMU7 |
||||||
410 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S |
|
||
412 |
|
|
|
|
|
|
|
|
|
|
|
Fuse |
|
|||||
414 |
|
Access Key |
|
|||||||||||||||
416 |
|
Operate Key |
|
|||||||||||||||
418 |
|
|
ME |
|
||||||||||||||
The top three bits of the access key must match the key value register in order to read/write the register set. Also, the ‘s’ bit must be set.
The lower five bits of the access key select the map for one of thirty-two tasks.
The operate key determines which task is the task actively mapping the address space.
The MMU divides memory up into 512 256k pages. Address bits 18 through 26 index into a map table to find the physical address page.
Transitioning into Kernel mode causes the ‘s’ bit to be set. This results in the MMU using task#0 to map addresses. The processor transitions into Kernel mode when a hardware interrupt or software exception occurs. In order to allow other tasks to map addresses, the countdown fuse must be set. When the countdown expires (it has to reach -1) the ‘s’ bit is cleared, and the task identified by the operate key controls memory mapping. The only way to clear the ‘s’ bit is by setting the countdown fuse. The ‘s’ bit is contained in a read-only register.
Task #0 is assumed to be the system task. Task #1 is assumed to be the DMA task. When the cpu transitions into kernel mode, task #0 is selected as the map controller. The ‘s’ bit is set which forces task #0 to map addresses. The task actively mapping addresses is controlled by the operate key when the ‘s’ bit is not set.
Addresses pass through the MMU unaltered until the mapping enable bit is set. Until mapping is enabled, the physical address will match the virtual address. Additionally address bits 0 to 17 pass through the MMU unaltered. Address bits 28 to 63 pass through the MMU unaltered as well.
Logical | Arithmetic | Shift / Rotate | Flow Control | Compare | Branch | Load / Store | I/O | BitFld | Data Move | ||||||
and | andi | add | addi | shlu | call | slt | slti | blt | blti | lb | lbx | inb | inbx | bfextu | mux |
or | ori | addu | addui | shl | jmp | sle | slei | ble | blei | lbu | lbux | inbu | inbux | bfexts | movz |
xor | xori | sub | subi | shru | jal | sgt | sgti | bgt | bgti | lc | lcx | inch | incx | bfins | movnz |
andc | subu | subui | shr | ret | sge | sgei | bge | bgei | lcu | lcux | incu | incux | bfset | movpl | |
orc | mulu | mului | rol | trap | sltu | sltui | bltu | bltui | lh | lhx | inh | inhx | bfclr | movmi | |
nand | muls | mulsi | ror | sleu | sleui | bleu | bleui | lhu | lhux | inhu | inhux | bfchg | mov | ||
nor | divu | divui | shlui | iret | sgtu | sgtui | bgtu | bgtui | lw | lwx | inw | inwx | min | ||
xnor | divs | divsi | shli | eret | sgeu | sgeui | bgeu | bgeui | sb | sbx | outb | outbx | max | ||
com | modu | shrui | syscall | seq | seqi | beq | beqi | sc | scx | outc | outcx | swap | |||
not | mods | shri | exec | sne | snei | bne | bnei | sh | shx | outh | outhx | ||||
sqrt | roli | cmp | cmpi | bra | sw | swx | outw | outwx | |||||||
neg | rori | cmpu | cmpui | band | |||||||||||
abs | bor | ||||||||||||||
sgn | bnr | ||||||||||||||
loop | |||||||||||||||
Segment | |||||||||||||||
mtseg | mtspr | mtep | gran | setlo | |||||||||||
mfseg | mfspr | mfep | setmid | ||||||||||||
mtsegi | iepp | sethi | |||||||||||||
mfsegi |