Raptor64

Home  Cores

Description

Raptor64 is a 64-bit multi-context RISC cpu that supports hyper-threading. There are 16 register sets that the processor automatically switches between at high speed. The processor is fully pipelined with a six-stage pipeline. Stages: IF/RF/EX/M1/M2/WB. Communication with memory is via a 64 bit WISHBONE bus. The processor has a 16kB instruction cache and 32kB data cache. The processor uses 32 bit instructions.

I've created two versions of the processor a non-hyper-threaded version (sc) in addition to the hyper-threaded multi-context(mc) one.

 

Features

- 32 entry 64 bit general register file
- 32 bit opcodes (4 per 128 bits)
- SQRT,Multiply/Divide/bit field/ + all the regulars
- conditional move, exec,
- explicit I/O instructions ( also useful for uncached access)
- immediate constants may be built using SETLO,SETMID,SETHI instructions
- two address modes, displacement (d15[ra]) and scaled indexed (d2[ra+rb*scale])
- 16 segmentation registers
- SimpleMMU - 32 tasks supported with mapping of 128MB space into 256kB pages
- 64 single bit semaphores
- 16kiB instruction cache, 32kiB data cache
- single cycle execution of most instructions (loads stall the pipeline)
- branch prediction with a 256 entry branch history table
- return address stack prediction
- internal Harvard architecture
- communicates externally using a 64-bit WISHBONE bus

 

Software

In the works is currently a high level language compiler for a language similar to 'C'. Several additional keywords have been added (eg. interrupt). Well I finally fed the output of the compiler through the assembler. A couple of bug fixes later the sieve is able to run from SD Card.
There is also an assembler (also a work in progress).
Tiny Basic is available in the boot rom. Works with a few bugs yet.

 

Status

Currently the processor is running code in an FPGA. The bootrom is slowly expanding. Numerous software and processor fixes have taken place. Still a long way to go. The processor is being revamped to use a 32 bit ISA, it was originally a 42 bit ISA.

The core is running on an Atlys board, and now able to load a boot program from an SD Card. Hopefully that will speed the software development up. Prior, the only software was updated by updating a Verilog source file, requiring the entire system to be rebuilt for a software update.

The ISA is still under constant review; it may change to use an 8-bit master opcode field as opposed to 7-bits. There's lots of instructions I'd like to add, and no room with only 7 bits.

 

Downloads

Raptor64.zip (download not working yet)

 

General Purpose Registers

The Raptor64 has 32 general purpose registers, although four registers have special uses. R0 always reads as the value zero. R31 is the subroutine link register (LR). The call instruction automatically updates this register with the return address of a subroutine. This register is also used implicitly by the return instruction. R30 is the stack pointer (SP) register. The return instruction automatically updates this register. R29 references the program counter for the instruction and may be used to form program counter relative addresses. R29 is a read-only register.

Register Usage  
r0 zero register; always zero hardware defined
r1 subroutine return value software convention
r2 subroutine return value software convention
r3 temporary software convention
r4 temporary software convention
r4 temporary software convention
r6 temporary software convention
r7 temporary software convention
r8 temporary software convention
r9 temporary software convention
r10 temporary software convention
r11 register variable software convention
r12 register variable software convention
r13 register variable software convention
r14 register variable software convention
r15 register variable software convention
r16 register variable software convention
r17 register variable software convention
r18 register variable software convention
r19   software convention
r20   software convention
r21   software convention
r22   software convention
r23   software convention
r24 constant builder software convention
r25   software convention
r26   software convention
r27 exception address software convention
r28 base pointer software convention
r29 program counter hardware defined
r30/sp stack pointer hardware defined
r31/lr return address/link register hardware defined
     
     
     

Segmentation Registers

Raptor64 has 16 segment registers. For data addresses the segment register is chosen by the most significant four bits of the address. For example an address like 0xE000000000000010 uses segment register #14 because the upper nibble of the address is an 'E'. Note that segmentation does not apply to I/O addresses. I/O instructions 'in' and 'out' do not use segmentation; only load and store instructions use it.

For code addresses segment register #15 is always used unless the upper nibble of an address is 'F' in which case segmentation is ignored. This allows the operating system code located in the memory region 0xFxxxxxxxxxxxxxxx to run without paying attention to segmentation. An alternate name for segment register #15 is the CS (code segment) register.

There are four instructions supporting segment registers. Mtseg - move to segment register, mtsegi - move to segment register indirect, mfseg - move from segment register and mfsegi - move from segment register indirect.

The segment register is added (without a shift) to the effective address to form a final segmented address. Some of the low order bits of the segment register are always zero.

 

Execution Pattern Table

The execution pattern table is a 256 entry table that contains context id's. The processor periodically cycles through the execution pattern table to determine which register set context to use. For the non-hyper threading CPU the iepp instruction is used to cycle through the table. This instruction will typically be called from a timer interrupt service routine. The hyper-threading CPU automatically cycles through the execution pattern table. The execution pattern table can be used to control the frequency with which particular contexts are executed. Higher priority contexts can be given more slots in the execution pattern table which will result in the context being executed more frequently. Note that slot #0 of the 256 slots is always zero. This means that context #0 is guarenteed to always execute. Additionally, the execution pattern table is initialized to zero on reset, meaning that context zero is the only executing context. The execution pattern table is updated and read using the mfep - move from execution pattern, and mtep - move to execution pattern table instructions.

 

Simple MMU

Overview

The SimpleMMU provides simple memory management capabilities for the Raptor64 CPU. Memory management by the SimpleMMU includes virtual to physical address mapping. The SimpleMMU divides a 128MB memory space up into 512,  256kB pages and supports 32 tasks. Processor address bits 18 through 26 (the virtual address) are used as a nine bit index into a map table to find the physical address page.  The MMU remaps the nine address bits into a 10 bit value used as address bits 18 to 27 when accessing a physical address. The lower eighteen bits of an address pass through the MMU unchanged. Also passing through the MMU unchanged are address bits 28 to 63. It is assumed that in the system where the Simple MMU would be relevant, that some or all of the high order bits of an address would be left unconnected. I/O accesses are not mapped by the SimpleMMU and I/O addresses pass through the MMU unchanged.

Map Tables

The mapping table for memory management is stored directly in the SimpleMMU rather than being stored in main memory as is commonly done. The SimpleMMU directly supports up to 32 tasks. Each task has its own mapping table. The mapping table for only a single task is accessible at one time. Mapping table access is controlled by an access key. Eight MMU’s may be used in a system to allow up to 256 tasks.

Access Key

Access to the mapping table is controlled by an access key. The access key contains the task number for the mapping table to be accessed. The mapping table for only a single task may be accessed at one time. In order to access a map table for another task, the access key must be updated with the desired task number. The upper three bits of the access key identify the MMU to be updated or read from. The lower five bits of the access key identify the map table.

Operate Key

The operate key controls which map table (which task) is currently mapping addresses. The upper three bits of the operate key identify the MMU actively mapping addresses. The lower five bits select the map table within an MMU.

Key Value Register

The key value register is used to identify the MMU and is how the MMU’s are differentiated.  Operations on the MMU data are only possible if the key value register matches the high order three bits of the access key.

Mapping Table / Register Set:

The register set for the MMU is based at I/O address of $DC4000. The mapping table appears as a set of 1024 consecutive I/O locations. All mmu’s share a common register set occupying the same I/O address range, with the exception of the key value register. Access to a particular mmu is controlled by the top three bits of the access key.

Reg

D15

D14

D13

D12

D11

D10

D9

D8

D7

D6

D5

D4

D3

D2

D1

D0

 

00

WP

 

 

 

 

 

PA27

PA26

PA25

PA24

PA23

PA22

PA21

PA20

PA19

PA18

512 map entries per task

02

WP

 

 

 

 

 

PA27

PA26

PA25

PA24

PA23

PA22

PA21

PA20

PA19

PA18

04

WP

 

 

 

 

 

PA27

PA26

PA25

PA24

PA23

PA22

PA21

PA20

PA19

PA18

 

3FE

WP

 

 

 

 

 

PA27

PA26

PA25

PA24

PA23

PA22

PA21

PA20

PA19

PA18

400

 

 

 

 

 

 

 

 

 

 

 

KV MMU0

Only one register per MMU

402

 

 

 

 

 

 

 

 

 

 

 

KV MMU1

404

 

 

 

 

 

 

 

 

 

 

 

KV MMU2

406

 

 

 

 

 

 

 

 

 

 

 

KV MMU3

408

 

 

 

 

 

 

 

 

 

 

 

KV MMU4

40A

 

 

 

 

 

 

 

 

 

 

 

KV MMU5

40C

 

 

 

 

 

 

 

 

 

 

 

KV MMU6

40E

 

 

 

 

 

 

 

 

 

 

 

KV MMU7

410

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S

 

412

 

 

 

 

 

 

 

 

 

 

 

Fuse

 

414

 

Access Key

 

416

 

Operate Key

 

418

 

 

ME

 

                                     

 

The top three bits of the access key must match the key value register in order to read/write the register set. Also, the ‘s’ bit must be set.

The lower five bits of the access key select the map for one of thirty-two tasks.

The operate key determines which task is the task actively mapping the address space.

 

The MMU divides memory up into 512  256k pages. Address bits 18 through 26 index into a map table to find the physical address page.

Kernel Mode and the ‘s’ bit.

Transitioning into Kernel mode causes the ‘s’ bit to be set. This results in the MMU using task#0 to map addresses. The processor transitions into Kernel mode when a hardware interrupt or software exception occurs. In order to allow other tasks to map addresses, the countdown fuse must be set. When the countdown expires (it has to reach -1) the ‘s’ bit is cleared, and the task identified by the operate key controls memory mapping. The only way to clear the ‘s’ bit is by setting the countdown fuse. The ‘s’ bit is contained in a read-only register.

Tasks

Task #0 is assumed to be the system task. Task #1 is assumed to be the DMA task. When the cpu transitions into kernel mode, task #0 is selected as the map controller. The ‘s’ bit is set which forces task #0 to map addresses. The task actively mapping addresses is controlled by the operate key when the ‘s’ bit is not set.

Address Pass-through

Addresses pass through the MMU unaltered until the mapping enable bit is set. Until mapping is enabled, the physical address will match the virtual address. Additionally address bits 0 to 17 pass through the MMU unaltered. Address bits 28 to 63 pass through the MMU unaltered as well.

 

Raptor64 Instruction Set

Logical Arithmetic Shift / Rotate Flow Control Compare Branch Load / Store I/O BitFld Data Move
and andi add addi shlu call slt slti blt blti lb lbx inb inbx bfextu mux
or ori addu addui shl jmp sle slei ble blei lbu lbux inbu inbux bfexts movz
xor xori sub subi shru jal sgt sgti bgt bgti lc lcx inch incx bfins movnz
andc   subu subui shr ret sge sgei bge bgei lcu lcux incu incux bfset movpl
orc   mulu mului rol trap sltu sltui bltu bltui lh lhx inh inhx bfclr movmi
nand   muls mulsi ror   sleu sleui bleu bleui lhu lhux inhu inhux bfchg mov
nor   divu divui shlui iret sgtu sgtui bgtu bgtui lw lwx inw inwx   min
xnor   divs divsi shli eret sgeu sgeui bgeu bgeui sb sbx outb outbx   max
com   modu   shrui syscall seq seqi beq beqi sc scx outc outcx   swap
not   mods   shri exec sne snei bne bnei sh shx outh outhx    
    sqrt   roli   cmp cmpi bra   sw swx outw outwx    
    neg   rori   cmpu cmpui band              
    abs           bor              
    sgn           bnr              
                loop              
                               
Segment                            
mtseg   mtspr mtep gran                 setlo    
mfseg   mfspr mfep                   setmid    
mtsegi     iepp                   sethi    
mfsegi