The Evolution Of CPU Processing Power Part 2: Rise Of The x86

The Evolution Of CPU Processing Power Part 2: Rise Of The x86

if you're starting this series with this video I highly recommend you begin by watching part one it provides a brief overview of how a CPU works and offers a foundation for each successive presentation in this series also it should be noted that this series was originally conceived as a three-part project however due to the enormous scope of the topic and level of detail I wish to touch upon I've since expanded the series plan significantly beyond three installments as the 1970s progressed CPU designs grew more robust faster clock speeds larger address capacities and more elaborate instruction sets were all being leveraged by 1972 the next major offering from Intel was the 8008 the 8-bit successor to the intel's 4-bit product line the 8008 offered 14 bits of address capacity a large instruction set more registers and a new memory access feature but aside from its technical achievement the 8008 was part of arguably one of the most important contributions in computing history shortly after the 8008 was released to teenagers from washington decided to build an a do it based computer to analyze traffic data for the washington state highway department the partnership called travel data was a moderate success but prepared the two entrepreneurs for their next endeavor a few years later those two entrepreneurs were Bill Gates and Paul Allen and their next endeavor was the creation of a company called Microsoft this is the evolution of CPU processing power part 2 one of the more prominent additions to the 8008 feature list was the inclusion of indirect addressing with direct addressing memory location is provided to an instruction where it then fetches the data contents of that address location in indirect addressing the contents of that referenced memory location is actually a pointer to another location where the data actually is in direct addressing allows for some powerful software techniques say we stored five sentences in memory each character in the sentence is represented by a byte and terminated with a period our program will treat the period as the end of the sentence this is known as a string of data our programs job is to pick one of the five sentences at random and pass the address of the first character of the picked sentence to another part of the program we know that there are five sentences so our program first picks a random number between one and five but how do we translate this random number between one and five to the address of our sentences in the list if we create a table with five rows in it and fill each row with the address of each sentence respectively we create what's known as a vector table this vector table can now be used to map a number between one and five by pointing to the sentence address stored in the table rule if our program randomly picked three we look at the address stored at Row three in our table this address points to the start of sentence three which our program will now pass on vector tables are common but powerful use of indirect addressing in software the 8008 also implemented a mechanism known as interrupts interrupts allow Hardware signals and internal CPU events to pause program execution and jump to a small high priority region of code examples of interrupts events could be a real-time clock signal a trigger for a piece of external hardware such as a keyboard or a change in the CPUs internal state even program code can trigger an interrupt after the execution of the interrupts service code the original program would resume interrupts allow for fast responding low overhead code to be executed outside of the main program for example when a mouse moves the hardware signals the CPU to pause its current code execution and jump to code dedicated to processing Mouse events the location of this code is typically pointed to by an interrupt vector table that code may perform some math to update the current mouse cursor position and click status it then triggers updates to the cursor position on the screen finally it passes the CPU back to the code that was running before the interrupt was triggered the interim mechanism is a key part of interfacing external hardware to a CPU and is heavily utilized in what we call device drivers today another side effect of having the ability to pause a program and take control of the CPU is that the interrupt code can return control of the CPU back to a completely different program in memory this powerful concept is the basis for multitasking a mechanism we'll explore in more detail later in this series the features of the 8008 laid the foundation for modern desktop CPUs its DNA can be found in the processors of today the fast pace of CPU evolution hit another milestone in 1974 with the launch of the first true progenitor of the modern desktop CPU the Intel 8080 created on n mass semiconductor technology the 8080 was able to operate it up to three point one megahertz a significant improvement over the silicon gate technology of previous CPUs it had the ability to address sixteen bits of memory allowing the use of up to 64 kilobytes despite it being an 8-bit processor it can also perform limited 16-bit operations the 8080 was the first in Intel's product line to utilize an external bus controller this support chip was responsible for interfacing with RAM and other system hardware components these communications are commonly referred to as input/output or IO this allowed the CPU to interface with slower memory and IO that operated on system clock speeds that were slower than the CPUs clock speed it also enhanced overall electrical noise immunity the 8080 was considered by many the first truly usable microprocessor however competing processor architectures were emerging during the next few years the rise of desktop computing was being dominated by the competing zilog z80 CPU which ironically was an enhanced extension of Intel's own 8080 and was designed by former Intel engineer federico fuji intel's answer to this was a radically advanced 32-bit cpu called the 8800 the project proved to be over ambitious and eventually became a commercial failure when it was finally launched in the early 80s as the ia px 432 a rapid response was needed to the growing z80 threat this led to a more realistic project that began in 1976 within three months with nothing more than four engineers and no CAD tools the first revision of their response was created within two years the complex design was refined into a final product maintaining code compatibility with the popular 8080 based product line the next major milestone in cpu evolution came on June 8th 1978 with the launch of the famous Intel 8086 securing its place in computing history with its instantly recognisable moniker of the x86 architecture the 8086 gave rise to Intel's most successful line of processors taking on a more software centric approach the ae86 offered features unseen before in their flagship processor line designed as a fully 16-bit CPU it offered eight general-purpose 16-bit registers with four flows registers having the ability to be used as double 8-bit registers implemented in n Mo's technology with an adventure migration to Intel's H mas process it could achieve clock rates of up to ten megahertz coupled with this new performance better memory management was added the way the 8086 managed memory was also one of its most notable characteristics having 20 bits of address capacity it was capable of making use of a full megabyte of memory to make use of that larger address space in a 16-bit system until employed a peculiar segmented memory structure in which memory locations were assigned to addresses a memory segment address and an address relative to the segment itself this can be thought of as assigning a 16-bit viewing window to a section of system memory the memory management mechanism of the 8086 allowed for four simultaneous segment definitions one assigned specifically for executable code one for data one for a stack in an extra multi-purpose segment it was considered a cumbersome system by some programmers but it was also a convenient for then common smaller programs as it offered a pseudo sandbox of a 16-bit slice of memory if you've ever ran a dos comp program file in the late 80s or 90s the contents of that file are by design an actual binary dump of a segment of memory to be executed by the processor keeping in line with the software centric ethos CPU support of higher level programming languages was enhanced by the addition of more robust stack instructions in software design commonly used pieces of code are structured into blocks called a subroutine it may sometimes also be referred to as a function procedure or a sub program to illustrate this let's say we made a program that finds the average of thousands of pairs of numbers to do this efficiently we write a block of code that takes in two numbers calculates their average and return it our program now goes through the list of number pairs calling the subroutine to perform the calculation and returning the result back to the main program sequence when a program jumps to a subroutine it needs to store where its current memory position is so it can return back to the main program after the subroutine completes it also needs a place to store the data it's sending to the subroutine this is solved by a memory mechanism called a stack it's used to store this temporary information a memory stack directly takes its name from a cafeteria spring-loaded tray dispenser when you push a new tray onto the stack it's at the top when the next person pops a tray off the top of the stack they get the one that was pushed on last this is exactly how a memory stack operates data is pushed on and then popped off the CPU instructions that deal with stack operations are actually called some variant of push and pop data can be pushed repeatedly but when popped you always get the last most data that was pushed onto the stack this behavior is known as last in first out when a program calls a subroutine the data and return address is pushed onto the stack the subroutine then copies the data off the stack without popping it this is called stack peeking it then performs its job with the results usually being written to a register when it's done it pops and returns the address along with the sent data off the stack and returns to the main program when a subroutine is called it's data requirements must be clearly defined so that the correct amount of data can be pushed onto the stack if a bug exists in the code it's possible to push the incorrect amount of data onto the stack that will be popped on a subroutine call this can cause the stack to change in size unpredictably either shifting it beyond its allocated memory or looping over onto itself this is known as a stack overflow and can cause a program to crash other sources of stack based crashes are popping the wrong data as a return address and having the subroutine return to a random location in memory a crash occurs when a CPU jumps to an unintended address and begins executing data that isn't program code because it's effectively executing programmatic gibberish the Machine becomes unresponsive the notable complexity of the 8086 and its success had cemented intel's commitment to a key characteristic of its architecture sisk or complex instruction set computer though assist architecture was used in the 8080 and its model enhanced successor the 80-85 the 8086 marked intel's transition into the full-blown adoption of sisk architecture with its robust instruction set with only a handful of cpus employing it sisk architecture is a relatively rare design choice when compared to the dominant risk or reduced instruction set computer architecture even today the x86 cpus remain the only mainline processors that use assist instruction set the difference between a RISC CPU and a sisk CPU lie within their respective instruction set and how they're executed in a RISC CPU the instructions are kept simple and primitive as possible the entire instruction set is generally small and tightly structured instruction decoding is very simple and execution is done quickly instructions are generally processed at a rate of one per clock cycle however in RISC the burden of complexity is placed on software complex operation must be synthesized programmatically a good example of this in practice is the fact that many early RISC processors lacked the ability to multiply numbers together they required several instructions to perform a simple multiplication operation in software because RISC used dense program code to perform complex operations risk based programs require more memory in the early days of computing memory was extremely expensive costing over $50,000 per megabyte furthermore accessing memory was slow every fetch of a new instruction came with a performance penalty for the memory access bottleneck sisk architecture was the solution to these issues in a sisk design the instruction set is robust and supports many complex features in the hardware of the cpu complex programs can be made easier with fewer instructions and less use of memory our risk multiplication code that required several instructions to perform can now be replaced with a single multiply instruction because software require fewer instructions to perform a task it reduced the need to access memory taking some pressure off the memory access bottling another advantage of sisk design is its use of an orthogonal instruction set an instruction set is considered orthogonal when its instructions can operate in all forms of addressing the CPU supports for example if an instruction can operate on any register and any direct or indirect address in memory it's considered orthogonal it has no limitations to its operand this feature dramatically reduces code complexity by eliminating the need to perform data of shuffling within a program the trade-off to the Sisk architecture is that instructions execute its lower overall due to their complexity many of them takes several clock cycles the mechanism that decodes and executes instructions within a sissy P you are very complex and trickier to design furthermore techniques that allow for optimizing the throughput of a RISC processor are difficult to translate over to a Sisk processor as we'll explore later parts of the series over time the cost of memory eventually fell and the advantages Sisk offer diminished fast-forward today and the only true hold out of the Sisk design has been the contemporary successors of the 8086 mostly due to its need to be backwards compatible however support of this vestige of the past is only skin-deep modern CPUs blur the line between risk and Sisk by employing the RISC design at its core and wrapping them in a Sisk outer layer employing techniques from both architecture dogless aside from adopting which Sisk architecture the performance penalty of accessing memory was also combated in new ways in the 8086 since the execution of an instruction and was generally slow and required more clock cycles to complete the idle time was utilized in the fetch region of the CPU known as pre fetch the next instruction and memory would be loaded into the CPU while the last instruction was still executing this improved processing throughput significantly especially with instructions that did not require memory access as it reduced the bottleneck on the CPUs processing prefetching is a simple form of a technique known as pipelining pipelining is a method in which steps in the processing cycle are arranged to reduce processing bottlenecks and increase throughput will explore pipelining as we get further along in this series the 8086 is performance was further enhanced by the ability to make use of the 8087 a separate coprocessor dedicated to floating point math traditional Ward's of binary data represent integer values CPUs handle them easily and performed arithmetic via their arithmetic logic unit however the limitations to this is in performing more advanced math containing decimals floating-point numbers solve this problem by encoding decimal numbers among the bits of a word because floating-point numbers are fundamentally incompatible with CPU arithmetic hardware performing math on them had to be done inefficiently in software with code designed for floating-point numbers pushing the limits of technology of the time until created the 8087 floating-point coprocessor to move floating-point processing into the hardware room the 8087 was designed to work with the 8086 directly intercepting instructions meant for itself from the bus and processing it it would operate independently of the cpu fetching memory on its own when needed it had the capability of performing floating point based addition subtraction multiplication division and square roots it also computed transcendental functions such as exponential logarithmic or Trigger metric calculations the 8087 could perform 50,000 floating-point operations per second and because it operated independently of the cpu it could be set to perform math operations simultaneously while the CPU executes other program instructions the success of the 8086 processor is synergistically linked to the another runaway success in computing history in the late 1970s the new personal computer industry was dominated by the likes of Commodore Atari Apple and the Tandy corporation with a projected annual growth of over 40% in the early 80s the personal computer market gained the attention of mainframe giant IBM rumors began to spread of IBM's entry into the market space after a rapid development phase of only 12 months on August 12 1981 the IBM personal computer or PC was debuted powered by a variant of the 8086 the 8088 it was an immediate success the PC was small lightweight and easy to use it was targeted as a personal computer for anyone not to large corporations over 40,000 units were ordered upon its announcement with skyrocketing demand over a hundred thousand units were sold that first year it soon surpassed the Apple 2 as the best-selling personal computer of the time period the two young entrepreneurs from truffaut data not partnered as Microsoft was awarded the contract to develop the operating system for the IBM PC known as PC DOS their first major product solidified Microsoft's dominance in the industry and directly led it to become one of the largest companies on earth the success of the PC letsa companies creating IBM compatible machines hardware and software packages began to flood the market in support of this new platform the PC quickly became the standard of personal computing over the decades it has evolved past its original IBM roots but still maintains a dominance on the personal computing market today empowering this prevalence of PC architecture in today's personal computers and servers is the enduring legacy of the 8086

29 thoughts on “The Evolution Of CPU Processing Power Part 2: Rise Of The x86

  1. Excellent content, well-presented. Watching part 1 where you talked about assembly language brought back memories (no pun intended!). I cut my coding teeth on a 6502 then a Z80 and then a 68000. 9-year-old me would have loved to have seen this! Keep up the great work – looking forward to future content.

  2. got news. nowdays cpus are more risc than sisc. they are a risc cpu hidden behind a decode unit basicly. you send a complex instruction, it gets translated into simple instructions. risc behind sisc.

  3. The first RISC CPU is generally seen as IBM 801 which came out in 1980. Then since there was RISC the older instruction became set became CISC. Also RISC needed a 32BIT so they could have the instructions
    The most successful PC in the 80 was the C64 with MOS 6510. Selling 1 million units a year dwarfing IBM sales. It was only in the late 80's sales picked up but that was not IBM but the Clones. The Motorola 68K was very popular in the MAC, AtariST, Commodore Amiga and Sun Microsystems. There were also the early ARM cpu which were RISC created by the BBC. Hope you can speak of those CPU.

  4. The IBM PC was a boring business machine alll throughout the 80’s. It had no graphics acceleration. Shitty colour support until EGA. Terrible sound hardware until add-in cards fixed it. Its advantage was handling large amounts of text.

    I would say it overtook its competitiors as the home/gaming conputer to own in 1992. Competition and clones made it a much cheaper more capable system. VGA, sound blasters and the need for a harddrive due to game size (ever played monkey island 2 on the amiga?) meant that it wasn’t at a huge disadvantage anymore. It was fast enough that 2D games could scroll smoothly. 3D games started using texturing and the AMIGAs BLIT could not help it here. Wolfenstein 3D, ultima underworld and ultima 7 really showed that PC was overtaking the AMIGA. Next year Doom etc. C64 and AMIGA owners were just laughing at the PC all throughout the 80’s and rightly so. 1992 that changed.

  5. X64 is a CISC instruction set, but ever since the pentium pro the CPU itself is RISC. Instructions are decoded into multiple RISC-like micro-ops (same instruction size, simple instructions) and then handled exactly as if they were RISC.

    The CISC frontend adds latency, and that’s bad, but it also effectively functions as memory compression. You would have needed more bandwidth to read the instructions for a RISC programme.

  6. I think it's actually only low-level binary calculations or operations that use just one clock cycle in RISC, I remember having programmed PIC's that took 4 clocks to change the state of the I/O for instance, and to do modulus.
    Anyway, still a good video.

    Edit: ah okay yes, you cover it in the next sentence : I've mostly used ANSI C, not so much assembler. But even in assembler flipping pins takes a while.

    Edit2: ah if anyone have an old Win9x or 2k computer around, the impact of interrupts can be experienced by holding down the Win and E keys for a few seconds… just a bonus fact.

  7. so THATS where the "x86" in "Program Files (x86)" came from. I had always wondered. I knew that 32 bit applications generally were located in that folder but I never understood why it wasn't named "Program files (32-bit)" or something like that.

  8. Not to pick a nit, but the PC only really saw success in the business market, but not the home computer market, until the very late 1980s, because IBM compatible PCs were stupidly expensive.

    8 bit Z80 and especially 6502 based home micros ruled the roost at home until then.

  9. Yer, you totally, … totally don't get why RISC, the newer architecture was created in response to the disadvantages of CISC.
    The only reason there is a single CISC instruction set architecture still left is because Intel can't change to comply with backward compatibility, and dones't want to allow people to use better architectures either – it simply wants them on THEIR architecture, no matter how totally outdated.
    RISC architectures dominated 1) the new use in smartphones 2) all CPU's in the world (there are far, far, far more RISC CPUs made in the world today vs CISC, like orders of millions, every day) for many reasons. You need to read up on those reasons.
    "CISC a runaway success???" you sound like a total Intel shill. x86 is like an old dogged one legged man – with a stick, holding on because he's keeping people captive on his long, long outdated architecture as he beats people back in to his inefficient, slow ecosystem.
    There's a reason the new iPad is more powerful that half of the laptops sold today, at under 1/4 the power.
    And why Google run thousands of purely RISC, in house made servers when they could just "buy Intel", same with IBM, and RISC Power architecture servers. The only places people use x86 is for software compatibility and backward compatibility, and that's it, all the real powerhouses, and places freed from Intel's ecosystem run on RISC.

  10. Why did you have RISC in part one, and CISC in part 2?
    CISC was only invented as a backronym to describe the old classically Intel, classically x86 etc.. old instruction sets and architectures.
    RISC was created as an advance in computing, and is much newer than CISC.
    With CISC you can do crazy things like add 1 to the value of some number in memory, or store the result of adding two numbers in memory to another location in memory. They are totally sloppy with their memory access instructions. In RISC you can only load or store any value in memory with specific instructions that do that and only that. Once loaded into working registers, then you can work with the values, and you can only store values to memory from these working registers. i.e. a Load/Store architecture. This is what make RISC so much faster and efficient, as the CPU only does simple things so can be made fast and efficient (with techniques now much more easily possible like pipelining/superscalar execution, branch prediction/speculative execution), and the memory subsystem only has to work with known expected memory accesses. All in all it's just much neater. Modern x86 CISC even transpose all their CISC x86 instructions into RISC like "micro ops" to have a RISC like inner core… and a whole load of bloat surrounding it, to cope with the CISC instructions. It's an old, outdated architecture that they're desperately trying to make as RISC like as possible, and transpose it even closer to a RISC-like architecture (by recommending to only use a RISC like subset of CISC x86 instructions) before ARM takes over the world.

  11. I love how all the names sound Hawaiian if you don't think about them as numbers xD Thank you for this content, I've learnt so much!

  12. I can't help but feel this doesn't capture much about the general feel around the rise of x86. Not just because you got history backwards – CISC was the original style as you can see from both early mainframes and early microcomputers, because they were cheaper to engineer when you have fewer registers – but more importantly cost has been the best indicator of success from the 80s down to this day. The behemoths of the 80s and 90s failed because selling individual machines for tens of thousands of dollars means nobody develops on your system. Symbolics, DEC, SGI, and almost even Apple failed for this reason. The x86 is a bad design that has had economy of scale, an entrenched proprietary software monopoly, and a superscalar design retrofitted, and that was absolutely the right choice. Economies of scale are still the right choice.

Leave a Reply

Your email address will not be published. Required fields are marked *