Having dealt with the transformation of
CISCs in the previous article, let us now look at how
CISCs have remained exactly the same, despite the urban legends that have been circulating for quite some time.
Taking, in fact, the four pillars on which
RISCs are based (which were mentioned in the second article), we know that it is enough for even one of them to be invalid for us to no longer be able to speak of
RISCs and, consequently, that we are dealing with a
Preserving one’s identity
At this point, verification should be quite trivial. It would be enough, in fact, to take any processor that is defined as
CISC and go and check the definitions of the four pillars one by one. Of course, I am not only talking about old processors, but also the very latest ones: there is no difference, definitions in hand.
So if a processor has a lot of instructions, or has instructions that require several clock cycles to complete their execution, or if they have a variable (instead of fixed) length, or even if some of them allow direct access to memory (instead of delegating the task only to load/store instructions), we can definitely say that it is a
As can be seen, it does not take much technical knowledge to be able to apply these four elementary concepts and understand to which macro-family a processor belongs. But, above all, to understand that, unlike
RISCs, coherence has always been at home in
Therefore, it makes no sense at all to claim that one can no longer speak of
CISCs, as there would have been a convergence and that, by now, they would be concepts that would be ‘mixed up’. These far-fetched theses must certainly be rejected, in the light of the oft-mentioned definitions.
It could be argued, at most, that this could only apply to
RISCs, which have been shown to have radically changed their nature, becoming essentially
CISCs. But the reverse has never happened, whatever one may say.
Fast instructions =
Some might object that, looking at execution times, the instructions of
CISC processors have long since mostly been executed in a single clock cycle, and this would be a clear sign of their ‘convergence’ to
CISC processors were not necessarily synonymous with slow instructions requiring several clock cycles: everything depended, in fact, on the particular implementation/microarchitecture in question.
If we take one of the oldest
CISC processors, the
6502 (1975 class), we can see how executing the accumulator sum with an immediate value (8-bit) requires, yes, 2 clock cycles, but that is necessary to read the two bytes (the instruction opcode and the immediate value), and with an 8-bit interface to the memory, one could certainly not expect times of less than two cycles!
With instructions consisting of only one byte, one would expect, compatibly with the number of memory accesses, an even smaller number of clock cycles to be required. This is not the case with the
6502 because, for example, the instruction to scroll to the left of one in the accumulator (
ASL) requires two clock cycles.
These are, however, intrinsic limitations of the original design (which, in any case, already sported the implementation of a pipeline, as is clearly evident), which were improved with the successors. In fact, in the
65CE02, the same instruction only requires one clock cycle. Many others have been similarly optimised, with the most commonly used ones being dominated, for execution times/clock cycles, by the number of memory accesses.
And it is also normal that this should be so, because technological advances have certainly not been the preserve of
RISCs alone, but have benefited any device, as can be quickly verified by checking the manuals of the various
CISC processors that have had successors.
The myth of the
But the most hammered as well as the most common contention in this sense remains the one that would push the presence/adoption of
RISC processors inside
CISC ones, with the former being the actual executors of the instructions/operations of the latters.
Needless to say, we are faced with one of the most widespread logical fallacies: the red herring. This is because it remains an attempt at misdirection from the concrete subject matter, i.e. the definition of
RISC (and, consequently, that of
CISC), which is the only thing that should be of interest when it comes to classifying a processor.
In fact, the aforementioned definition is sufficiently clear and intelligible to make it possible to discriminate in an absolutely precise and indisputable manner what is a
RISC and what is a
CISC, both from an architectural (
ISA) and microarchitectural point of view. Is there a definition? Let it be applied! Sic et simpliciter…
The situation, however, would still not change even if we temporarily ignored the definition of
RISC and took the concept of ‘mixed’ stuff for the two macrofamilies as true.
The reason for this is quite simple: the ‘internal
RISC‘ (SIGH!) works well exclusively due to the
CISC architecture, which allows the code to be better ‘compressed’ (reducing its size and, consequently, the space it takes up in the entire memory hierarchy. Hence improving the famous code density we have discussed at length) as well as the ‘useful work‘ that is performed to complete the various tasks/operations.
The fish died out of the water
A practical demonstration is not possible, as there are no processors (of which I am aware) that have implemented/exposed externally (i.e. with their
ISA directly usable) these ‘
RISCs‘, but it’s enough to make some considerations to come to the conclusion that they would be an absolute failure.
Specifically, it should be noted that such instructions are generally very large and take up a lot of space. An example that effectively exposes the concept is the famous
Pentium III, of which it is known that the size of the so-called
µOPs is 118 bits (equal to just under 15 bytes). This is in the fairly common case (i.e. most instructions), but it should be duly pointed out that there are several
x86 instructions that generate more
µOPs (and they are by no means rare!).
Let’s assume that there is a processor based on these
µOPs, therefore with specially compiled programs whose instructions coincide exactly with them. Without the reasoning losing its consistency, and to simplify matters, let us assume that these instructions are 128 bits long (10 bits longer. To make it a power of two), equal to a little more than an additional byte.
Now let us imagine this processor at work, i.e. in the act of executing these instructions that occupy 16 bytes each. Given that
x86 has instructions averaging 3 bytes in length, this means that the space occupied by the code of this phantom
RISC processor based on its
µOPs would be five times that of the x86 equivalent (15 / 3 = 5). Which in itself is an exorbitant value (even worse than
One might naively think that an instruction cache five times as large should be sufficient to ‘cure’ the problem. Granted that going from 32kB to 160kB, for example, would be crazy enough to make any sane person immediately abandon the idea, such a choice would, in any case, ignore two factors that are far from negligible.
The first is that the increase in size by a factor of five involves, as anticipated, the entire memory hierarchy. Hence from the instruction cache (
L1) to the
L3 caches, as well as the
TLB entries, and finally to system memory and its bandwidth consumption (whose requirements would be increased fivefold by the needs of these caches).
The second, and much more important, is that the performance of the code cache is not at all linear with respect to code size, but exponential. I quote again briefly the results from the thesis of one of the
Waterman shows that RVC fetches 25%-30% fewer instruction bits, which reduces instruction
cache misses by 20%-25%, or roughly the same performance impact as doubling the instruction
If reducing the code size by 25-30% (for
RISC-V. But a similar argument applies to any architecture) was roughly equivalent to the same performance impact as a cache size twice as large, the reverse would also be true: code 33% (or so) larger would require a cache size twice as large to have roughly the same performance impact.
Considering that the code for this theoretical architecture is five times that of
x86, this would mean that we would have to use caches for code around 32 to 64 times the size of
x86 in order to have similar performance.
I would say that we can safely drop the veil, and that is without even considering that
x86 generates more
µOPs for its instructions (making the situation worse, of course)…
Contrary to the proclamations one sometimes reads around, the consequence one draws from this is also another: the architecture of a processor matters, and how much! By this I refer, therefore, to how much it is ‘exposed to the outside world’: to programmers, compilers, etc..
How it is implemented is purely a question of the microarchitecture, which is still an internal detail. And, as an internal detail, it also means that it is completely irrelevant to the
µOPs are not
Throughout this discussion, the assumption has been that
µOPs were instructions of a
RISC processor. Which, in fact, is also denied by some
x86 processor designers. One of all, Bob Colwell, who headed the team that developed the first Intel processor using
Pentium Pro (also known as the
Intel’s x86’s do NOT have a RISC engine “under the hood.” They implement the x86 instruction set architecture via a decode/execution scheme relying on mapping the x86 instructions into machine operations, or sequences of machine operations for complex instructions […] The “micro-ops” that perform this feat are over 100 bits wide, carry all sorts of odd information, cannot be directly generated by a compiler, are not necessarily single cycle. But most of all, they are a microarchitecture artifice — RISC/CISC is about the instruction set architecture. […] The micro-op idea was not “RISC-inspired”, “RISC-like”, or related to RISC at all. It was our design team finding a way to break the complexity of a very elaborate instruction set away from the microarchitecture opportunities and constraints present in a competitive microprocessor.
I do not think it is necessary to add anything else, because the above extract is quite eloquent.
As a corollary to this, I would add that, since these are microarchitectural elements, we are talking about solutions adopted for the specific processors in which they are implemented, and that their successors will not necessarily adopt them in their entirety or even with some modifications, because new ideas may come along that are capable of disrupting them.
This is a truism, but considering the junk that circulates on the subject (especially on the Internet), I think it is only right to point it out.
One example of this is the very different implementations that Intel and AMD engineers have adopted. As can be seen from the studies on microarchitectures by the famous Agner Fog, Intel prefers to break down complex instructions into simpler
µOPs, while those used by AMD are more complex (and, therefore, fewer are generated).
Just go and check the
ADD instruction data, for example: AMD uses only one
µOPs (which it calls
MacroOP in its jargon) in most cases (it also happens in all cases, in several of its microarchitectures), while Intel ranges from one to four (depending on the complexity of the specific version of the instruction).
Now, it would be interesting if someone could explain to me how a processor could qualify as
RISC if it were able to execute the following instruction (of no less than 12 bytes):
ADD QWORD PTR [RBX + RAX * 8 + 0xDEADBEEF], 0x0C0FFEE0
in two or even one
MacroOP! The question was rhetorical, of course…
The last part of the article served to analyse and dismantle the heap of nonsense that has been circulating for quite some time, especially when it comes to
CISCs. It was not essential since, as already pointed out, what matters is the definition of
RISC and, consequently, that of
CISC, which are more than sufficient to settle the matter. But I preferred to deal with them so as not to leave anything unresolved or still nourish doubts on the subject, by reporting concrete facts in support of them.
The next article will close the series by taking stock of the situation and adding some final thoughts on this long-running diatribe.