Effects on Processor Architecture / Performance
The basic idea behind this processor instruction set enhancement is to allow dynamically determined operations to be performed by the cpu. So the actual operation performed is unknown or it remains a 'Mystery' until the actual instruction is executed by the processor.
In many cases bits from an opcode are passed directly through to a cpu functional unit to determine the exact behaviour of the functional unit. To the functional unit this is nothing more than an additional constant or value that has to be incorporated into it's calculations. By comparison, operand values for the functional unit can come directly from an immediate constant encoded in the opcode or alternately they can come from registers. There's no real reason why the same kind of selection couldn't be made for the actual control signals going to the function unit as well.
This idea arose out of looking at instructions used to evaluate program conditions while working on the ultimate processor design, which includes branch, set and compare type instructions. (Plus a little bit of sleep deprivation).
Looking at a basic 'set' instruction:
slt r5,r1
This instruction sets the value of r5 to true or false (1 or 0) if the result status in r1 indicates a previous compare operation resulted in a less than (slt = set if less than).
Taking a different point of view, the 'set' instruction can be seen as an operation similar to the usual arithmetic or logical operation. Although the set instruction appears to have only a single operand, it really has two operands. Where is the other operand ?
An arithmetic operation like 'add' has two operands (A, B) and a target (T). It performs a calculation on the operands and places the result in the target.
T <= A + B
Essentially the 'set' operation does the same thing. It has two operands, performs a calculation on them and places the result in the target.
T <= A op ?
Where 'op' represents an as yet unknown operation on the operands and the '?' represents a mystery operand. What are 'op' and '?'. The 'set' instruction, just like an add instruction requires specific hardware to perform the operation - the 'op'.
Looking at sample code for some hardware used in calculating the result of the set instruction, the code also reveals what unknown operation and the secret '?' operand is.
Verilog Code:
// evaluate set condition module setEval(cond, z, n, v, c, o); input [3:0] cond; input z; input n; input v; input c; output o; reg o; always @(cond or z or n or v or c) begin case (cond) `EQ: o <= z; `NE: o <= ~z; `MI: o <= n; `PL: o <= ~n; `VS: o <= v; `VC: o <= ~v; `LT: o <= n ^ v; `GE: o <= ~(n ^ v); `LE: o <= (n ^ v ) | z; `GT: o <= ~((n ^ v ) | z); `LTU: o <= c; `GEU: o <= ~c; `LEU: o <= c | z; `GTU: o <= ~(c | z); default: o <= 1; endcase end endmodule |
The unknown operation ('op') is really a collection of operations performed on the result status bits (z, n, v, and c) followed by a sixteen-to-one multiplexor to determine a final true / false outcome. The result status bits are what comes from the 'A' operand in T <= A op ?. The mystery operand (?) is the actual condition it is desired to test for - it's the 'cond' in the above code.
Normally the desired test condition is supplied by encoding bits directly in a cpu opcode when a program is compiled. It is statically fixed and determined when the program is written. But taking a slightly different point of view, one could view it as an immediate constant, so a set instruction is really kind of an immediate operand instruction. The question that then came to my mind was: if there is an immediate operand form, then why not allow a register operand form as well ? And thus the mystery operation processor was born.
First note that the same basic idea can be applied to other instructions as well such as logical operations where it is probably most useful. While it's not really all that useful an operation in a general sense, in certain circumstances it can be very valuable.
Example 1: Graphics Processing.
Graphics processing often involves raster operations (rop's) which are a form of logical operation performed between different graphics elements such as bitmaps. The exact operation to be performed quite often needs to be determined dynamically at run-time. For instance when running a program under a GUI operating system like Windows, System program calls take care of performing graphic operations for the application program. However those function calls built into the GUI system don't (and can't) know in advance what raster operations the program will be performing. So how is code written that doesn't know in advance which operation to perform ? Quite often using a small set of constants and a case, select or if statement (or possibly an indirect function call made from a table). This is SLOW. Sometimes programmers have been known to resort to self-modifying code in order to drastically improve the performance of programs. Simply by changing that 'and' operation into an 'or' operation for the raster op directly in the program's memory, one can speed up the code drastically because the branching logic isn't required. It should be noted that graphic accelerators often include specialized logic for performing raster operations, so this sometimes aleviates the needs for slow code in the GUI. However, what if you are working with a small system without a graphics accelerator ? Having the ability to specify the logical operation to be performed using a register avoids these kinds of things.
Example 2: Search Criteria
Suppose there is a program that needs to search for numbers based on user input (a spreadsheet program for instance). The user can select to look for numbers within certain ranges using operations like "<", ">=" and so on. Because this program is based on user input and it's already compiled (cast in stone) by the time the user sees it, there is little choice when the program is written but to code the user's selection using some sort of if/else if tree or case statement like the following:
if input='<'
if a < b then
.....
else if input = '>=' then
if a >= b then
If this code has to be run in a loop thousands or millions of times, it will be slow (it could be orders of magnitude slower than it need be). Because this code is slow and bulky, sometimes self-modifying code is used instead.
However, if the actual operation to be performed could be passed as a dynamically determined value (determined when the program is run), then self modifying code wouldn't be required and the program could remain small and fast. But then this requires programming language support...
Effects on Processor Architecture / Performance
The only way this idea really works well is if it's not necessary to perform additional decoding or hardware stages for the dynamically determined operation. That means that the dynamic portion of the operation must fall within the processor's existing framework for decoding and executing instructions. In most processor designs this is in fact the case. Most processors are divided into functional units (such as an adder/subtractor, logic unit, shift unit, multiplier) and the functional unit executes the operation during the execution stage of the processor.
Quite often opcode bits used in order to determine the exact operation of instructions end up being passed verbatium to the functional unit in order to conserve hardware . The cpu's instruction decoder simply determines which class of instruction (which functional unit is used to execute the instruction) to execute using other opcode bits. To the functional unit the control bits passed originally from the opcode look no different than the operand values passed from registers; and hence the control bits could come from a register as well. The functional unit simply takes a whole bunch of bits as input and produces a whole bunch of bits of output that corresponds somehow to the incoming bits.
One aspect of dynamically determined operations is that they require an extra register read path from the registers beyond what would be required for a statically determined operation. Depending on the processor and operations supported, this may increase the size of the register file needed due to an additional register read port or alternately it may require an addition clock cycle for an additional register read. It also requires an additional small multiplexor to select between the statically determined operations and those coming from a register. (This really isn't a significant issue as it's usually only a few bits - perhaps three or four at most).
Programming in assembler doesn't present a problem. A mystery operation instruction is coded just like any other assembly language instruction. It's just another instruction in the instruction set architecture of the processor. However most code isn't programmed in assembly language these days....
How do you write code for unknown operations in a high level language like Java ? By using unknowns of course. That means that the operation is stored in a variable. Since it's basically a bit pattern that's required, this works well using integer or character variables (enumerations would be even better).
Suppose there is a line of code like this
if (a >= b) then ....
Suppose it is necessary to translate this into a dynamically determined operation. What is desired is something like the following (where a, op, and b are all variables):
if (a op b) then
Unfortunately this type of code can't be used. A couple of things are required. The compiler needs syntactic elements in order to determine how to compile the program -- also how would it know what class of instruction to compile ? The (a op b) will only confuse the compiler, (and the programmer as well when looking at the code at a later date...) Also this idea can't be applied to just any old operation. The cpu itself needs to have the operation placed into a specific category (there's a little something called instruction decoding that needs to be done before the instruction is actually executed by the processor).
So, the proposal is to use syntactic elements that help the compile and processor, and the following syntactic elements are proposed:
<[{expression}] | indicates a dynamic relational operation |
&[{expression}] | indicates a dynamic logical operation |
+[{expression}] | indicates a dynamic addition/subtraction operation |
Code using this syntax might look like the following:
if (a <[op] b) then ....
This represents a relational expression that is determined at run-time.
flag = a &[op] b;
The above represents a logical operation that is determined dynamically at run-time.
Copyright (c) 2003 Rob Finch. Last revised: September 26, 2007.