488 private links
IBM found themselves in a similar predicament in the 1970s after working on a type of mainframe computer made to be a phone switch. Eventually the phone switch was abandoned in favor of a general-purpose processor but not before they stumbled onto the RISC processor which eventually became the IBM 801. //
They found that by eliminating all but a few instructions and running those without a microcode layer, the processor performance gains were much more than they would have expected at up to three times as fast for comparable hardware. //
stormwyrm says:
January 1, 2024 at 1:56 am
Oddball special-purpose instructions like that are not what makes an architecture CISC though.
Special-purpose instructions are not what makes an architecture RISC or CISC. In all cases these weird instructions operate only on registers and likely take only one processor bus cycle to execute. Contrast this with the MOVSD instruction on x86 that moves data pointed to the ESI register to the address in the EDI register and increments these registers to point to the next dword. Three bus cycles at least, one for instruction fetch, one to load data at the address of ESI, and another to store a copy of the data to the address at EDI. This is what is meant by “complex” in CISC. RISC processors in contrast have to have dedicated instructions that do load and store only so that the majority of instructions run on only one bus cycle. //
Nicholas Sargeant says:
January 1, 2024 at 4:00 am
Stormwyrm has it correct. When we started with RISC, the main benefit was that we knew how much data to pre-fetch into the pipeline – how wide an instruction was, how long the operands were – so the speed demons could operate at full memory bus capacity. The perceived problem with the brainiac CISC instruction sets was that you had to fetch the first part of the instruction to work what the operands were, how long they were and where to collect them from. Many ckock cycles would pass by to run a single instruction. RISC engines could execute any instruction in one clock cycle. So, the so-called speed demons could out-pace brainiacs, even if you had to occasionally assemble a sequence of RISC instructions to do the same as one CISC. Since it wasn’t humans working out the optimal string of RISC instructions, but a compiler, who would it trouble if reading assembler for RISC made so much less sense than reading CISC assembler?
Now, what we failed to comprehend was that CISC engines would get so fast that they could execute a complex instruction in the same, single external clock cycle – when supported by pre-fetch, heavy pipelining, out-of-order execution, branch target caching, register renaming, broadside cache loading and multiple redundant execution units. The only way that RISC could have outpaced CISC was to run massively parallel execution units in parallel (since they individually would be much simpler and more compact on the silicon). However, parallel execution was too hard for most compilers to exploit in the general case.