It's funny how the
requirements parameters of a processor can affect the instruction
set.Want a high clock frequency ? For some applications
this is the only solution that makes sense. If the application has a
low level of instruction parallelism, not many instructions can
execute at once. What's needed is a processor that can execute
instructions sequentially one at a time at a high rate of speed. The
only way to do that is with a high clock frequency. There are
applications out there that are highly complex in nature and not
suited to parallel execution of instructions. For these applications
a superscalar processor is overkill. It doesn't matter that the
processor is superscalar because the nature of the application
limits the performance to single instruction sequential execution.
For some embedded applications a processor with a high clock
frequency may make sense if there is a limited number of clock
sources available. IF there is only a single 100 MHz clock available
then the processor has to be able to be run at 100MHz. Also for
embedded applications a processor with a small footprint is often
desirable.
Need better overall performance ? Can you make use of some
instruction parallelism ? Are instructions to be executed somewhat
independent of each other with limited changes in instruction flow ?
Is it acceptable to use a somewhat lower clock frequency and are
more logic resources available ? Maybe a processor with an
overlapped pipeline is in order.
Do you really need maximum performance ? Is the nature of
the application highly parallel ? Is a lower clock frequency
acceptable ? Maybe a superscalar processor is in order.
Want a high clock frequency ? Look at using a simple
sequential design. Use flag registers for branching. Don't worry too
much about instruction interactions, as the instructions are
executing sequentially one at a time. Make the instructions
powerful. Use code compression techniques like variable length
instruction encodings. Keep the design small.
Want a processor with an overlapped pipeline for better
performance ? Take a serious look at eliminating instruction
inter-dependencies, in particular the flags register commonly found
in sequential non-overlapped pipeline designs. Dependent
instructions can slow the processor down due to the need to stall to
resolve dependencies. Take a serious look at making all the
instructions a fixed size with just a few formats for decoding
simplicity.
Want a superscalar processor for maximum performance ?
Take a serious look at predicated instructions. Predicated
instructions are almost mandatory for a processor capable of
fetching and executing multiple instructions at a time. The issue
that predicated instructions deal with is the branch miss penalty
for a when a branch is miss-predicted. Predicated instructions
eliminate some of the branches from the instruction stream, and
therefore eliminate some of the branch misses that would occur.
Branch misses are expensive because in a superscalar processor a
number of instructions have already been fetched, queued and issued
by the time the branch miss is detected. On a branch miss the
pipeline must be flushed, and a new set of instructions fetched from
memory. When a branch isn't present because of instruction
predication, it is not necessary to flush the pipeline, and hence
performance is increased.
To get an increase in performance level, the clock frequency of
the processor seems to have a downwards trend. The following chart
is for a hypothetical 64 bit processor.