The final RISCs vs. CISCs – 5: How CISCs… remained CISCs!

Having dealt with the transformation of RISCs into CISCs in the previous article, let us now look at how CISCs have remained exactly the same, despite the urban legends that have been circulating for quite some time.

Taking, in fact, the four pillars on which RISCs are based (which were mentioned in the second article), we know that it is enough for even one of them to be invalid for us to no longer be able to speak of RISCs and, consequently, that we are dealing with a CISC.

Preserving one’s identity

At this point, verification should be quite trivial. It would be enough, in fact, to take any processor that is defined as CISC and go and check the definitions of the four pillars one by one. Of course, I am not only talking about old processors, but also the very latest ones: there is no difference, definitions in hand.

So if a processor has a lot of instructions, or has instructions that require several clock cycles to complete their execution, or if they have a variable (instead of fixed) length, or even if some of them allow direct access to memory (instead of delegating the task only to load/store instructions), we can definitely say that it is a CISC.

As can be seen, it does not take much technical knowledge to be able to apply these four elementary concepts and understand to which macro-family a processor belongs. But, above all, to understand that, unlike RISCs, coherence has always been at home in CISCs

Therefore, it makes no sense at all to claim that one can no longer speak of RISCs and CISCs, as there would have been a convergence and that, by now, they would be concepts that would be ‘mixed up’. These far-fetched theses must certainly be rejected, in the light of the oft-mentioned definitions.

It could be argued, at most, that this could only apply to RISCs, which have been shown to have radically changed their nature, becoming essentially CISCs. But the reverse has never happened, whatever one may say.

Fast instructions = RISC?!?

Some might object that, looking at execution times, the instructions of CISC processors have long since mostly been executed in a single clock cycle, and this would be a clear sign of their ‘convergence’ to RISCs.

In reality, CISC processors were not necessarily synonymous with slow instructions requiring several clock cycles: everything depended, in fact, on the particular implementation/microarchitecture in question.

If we take one of the oldest CISC processors, the 6502 (1975 class), we can see how executing the accumulator sum with an immediate value (8-bit) requires, yes, 2 clock cycles, but that is necessary to read the two bytes (the instruction opcode and the immediate value), and with an 8-bit interface to the memory, one could certainly not expect times of less than two cycles!

With instructions consisting of only one byte, one would expect, compatibly with the number of memory accesses, an even smaller number of clock cycles to be required. This is not the case with the 6502 because, for example, the instruction to scroll to the left of one in the accumulator (ASL) requires two clock cycles.

These are, however, intrinsic limitations of the original design (which, in any case, already sported the implementation of a pipeline, as is clearly evident), which were improved with the successors. In fact, in the 65CE02, the same instruction only requires one clock cycle. Many others have been similarly optimised, with the most commonly used ones being dominated, for execution times/clock cycles, by the number of memory accesses.

And it is also normal that this should be so, because technological advances have certainly not been the preserve of RISCs alone, but have benefited any device, as can be quickly verified by checking the manuals of the various CISC processors that have had successors.

The myth of the RISC within CISCs

But the most hammered as well as the most common contention in this sense remains the one that would push the presence/adoption of RISC processors inside CISC ones, with the former being the actual executors of the instructions/operations of the latters.

Needless to say, we are faced with one of the most widespread logical fallacies: the red herring. This is because it remains an attempt at misdirection from the concrete subject matter, i.e. the definition of RISC (and, consequently, that of CISC), which is the only thing that should be of interest when it comes to classifying a processor.

In fact, the aforementioned definition is sufficiently clear and intelligible to make it possible to discriminate in an absolutely precise and indisputable manner what is a RISC and what is a CISC, both from an architectural (ISA) and microarchitectural point of view. Is there a definition? Let it be applied! Sic et simpliciter

The situation, however, would still not change even if we temporarily ignored the definition of RISC and took the concept of ‘mixed’ stuff for the two macrofamilies as true.

The reason for this is quite simple: the ‘internal RISC‘ (SIGH!) works well exclusively due to the CISC architecture, which allows the code to be better ‘compressed’ (reducing its size and, consequently, the space it takes up in the entire memory hierarchy. Hence improving the famous code density we have discussed at length) as well as the ‘useful work‘ that is performed to complete the various tasks/operations.

The fish died out of the water

A practical demonstration is not possible, as there are no processors (of which I am aware) that have implemented/exposed externally (i.e. with their ISA directly usable) these ‘RISCs‘, but it’s enough to make some considerations to come to the conclusion that they would be an absolute failure.

Specifically, it should be noted that such instructions are generally very large and take up a lot of space. An example that effectively exposes the concept is the famous Pentium III, of which it is known that the size of the so-called µOPs is 118 bits (equal to just under 15 bytes). This is in the fairly common case (i.e. most instructions), but it should be duly pointed out that there are several x86 instructions that generate more µOPs (and they are by no means rare!).

Let’s assume that there is a processor based on these µOPs, therefore with specially compiled programs whose instructions coincide exactly with them. Without the reasoning losing its consistency, and to simplify matters, let us assume that these instructions are 128 bits long (10 bits longer. To make it a power of two), equal to a little more than an additional byte.

Now let us imagine this processor at work, i.e. in the act of executing these instructions that occupy 16 bytes each. Given that x86 has instructions averaging 3 bytes in length, this means that the space occupied by the code of this phantom RISC processor based on its µOPs would be five times that of the x86 equivalent (15 / 3 = 5). Which in itself is an exorbitant value (even worse than Itanium!).

One might naively think that an instruction cache five times as large should be sufficient to ‘cure’ the problem. Granted that going from 32kB to 160kB, for example, would be crazy enough to make any sane person immediately abandon the idea, such a choice would, in any case, ignore two factors that are far from negligible.

The first is that the increase in size by a factor of five involves, as anticipated, the entire memory hierarchy. Hence from the instruction cache (L1) to the L2 and L3 caches, as well as the TLB entries, and finally to system memory and its bandwidth consumption (whose requirements would be increased fivefold by the needs of these caches).

The second, and much more important, is that the performance of the code cache is not at all linear with respect to code size, but exponential. I quote again briefly the results from the thesis of one of the RISC-V designers:

Waterman shows that RVC fetches 25%-30% fewer instruction bits, which reduces instruction
cache misses by 20%-25%, or roughly the same performance impact as doubling the instruction
cache size
.

If reducing the code size by 25-30% (for RISC-V. But a similar argument applies to any architecture) was roughly equivalent to the same performance impact as a cache size twice as large, the reverse would also be true: code 33% (or so) larger would require a cache size twice as large to have roughly the same performance impact.

Considering that the code for this theoretical architecture is five times that of x86, this would mean that we would have to use caches for code around 32 to 64 times the size of x86 in order to have similar performance.

I would say that we can safely drop the veil, and that is without even considering that x86 generates more µOPs for its instructions (making the situation worse, of course)…

Contrary to the proclamations one sometimes reads around, the consequence one draws from this is also another: the architecture of a processor matters, and how much! By this I refer, therefore, to how much it is ‘exposed to the outside world’: to programmers, compilers, etc..

How it is implemented is purely a question of the microarchitecture, which is still an internal detail. And, as an internal detail, it also means that it is completely irrelevant to the RISCs vs. CISCs question.

But µOPs are not RISC instructions!

Throughout this discussion, the assumption has been that µOPs were instructions of a RISC processor. Which, in fact, is also denied by some x86 processor designers. One of all, Bob Colwell, who headed the team that developed the first Intel processor using µOPs: the Pentium Pro (also known as the P6).

He stated:

Intel’s x86’s do NOT have a RISC engine “under the hood.” They implement the x86 instruction set architecture via a decode/execution scheme relying on mapping the x86 instructions into machine operations, or sequences of machine operations for complex instructions […] The “micro-ops” that perform this feat are over 100 bits wide, carry all sorts of odd information, cannot be directly generated by a compiler, are not necessarily single cycle. But most of all, they are a microarchitecture artifice — RISC/CISC is about the instruction set architecture. […] The micro-op idea was not “RISC-inspired”, “RISC-like”, or related to RISC at all. It was our design team finding a way to break the complexity of a very elaborate instruction set away from the microarchitecture opportunities and constraints present in a competitive microprocessor.

I do not think it is necessary to add anything else, because the above extract is quite eloquent.

As a corollary to this, I would add that, since these are microarchitectural elements, we are talking about solutions adopted for the specific processors in which they are implemented, and that their successors will not necessarily adopt them in their entirety or even with some modifications, because new ideas may come along that are capable of disrupting them.

This is a truism, but considering the junk that circulates on the subject (especially on the Internet), I think it is only right to point it out.

One example of this is the very different implementations that Intel and AMD engineers have adopted. As can be seen from the studies on microarchitectures by the famous Agner Fog, Intel prefers to break down complex instructions into simpler µOPs, while those used by AMD are more complex (and, therefore, fewer are generated).

Just go and check the ADD instruction data, for example: AMD uses only one µOPs (which it calls MacroOP in its jargon) in most cases (it also happens in all cases, in several of its microarchitectures), while Intel ranges from one to four (depending on the complexity of the specific version of the instruction).

Now, it would be interesting if someone could explain to me how a processor could qualify as RISC if it were able to execute the following instruction (of no less than 12 bytes):

ADD QWORD PTR [RBX + RAX * 8 + 0xDEADBEEF], 0x0C0FFEE0

in two or even one µOP/MacroOP! The question was rhetorical, of course…

Conclusions

The last part of the article served to analyse and dismantle the heap of nonsense that has been circulating for quite some time, especially when it comes to CISCs. It was not essential since, as already pointed out, what matters is the definition of RISC and, consequently, that of CISC, which are more than sufficient to settle the matter. But I preferred to deal with them so as not to leave anything unresolved or still nourish doubts on the subject, by reporting concrete facts in support of them.

The next article will close the series by taking stock of the situation and adding some final thoughts on this long-running diatribe.

Press ESC to close