Development of the MOS Technology 6502: A Historical Perspective

Jason SachsJune 18, 2022

One ubiquitous microprocessor of the late 1970s and 1980s was the MOS Technology MCS 6502. I included a section on the development of the 6502 in Part 2 of Supply Chain Games, and have posted it as an excerpt here, as I believe it is deserving in its own right.

(Note: MOS Technology is pronounced with the individual letters M-O-S “em oh ess”,[1] not “moss”, and should not be confused with another semiconductor company, Mostek.)

Photomicrograph © Antoine Bercovici (@Siliconinsid), reproduced with permission

Semiconductor Fabrication and the 6502

The 6502 was first delivered to customers in September 1975. This was one of a few iconic microprocessors of the late 1970s and early 1980s. To understand how big of an impact this chip had, all you have to do is look at its presence in many of the 8-bit systems of the era, sold by the millions:

The eventual ubiquitousness of the 6502-based personal computer was the end result of a long process that began thanks to Motorola for its pricing intransigence of the 6800 processor, and to Chuck Peddle and Bill Mensch for getting frustrated with Motorola. In March 1974, Motorola had announced the 6800, but did not reach production until November 1974, initially selling the chip for \$360 per processor in small quantities. Chuck Peddle had been giving marketing seminars to large customers in early 1974 — he’d smelled opportunity and tried to convince Motorola to pursue a lower-cost version for the industrial controls market, but they weren’t interested.[1 page 24-26] By August, Peddle had hatched a plan, leaving Motorola and setting off across the country to join MOS Technology, a scrappy little integrated circuit manufacturer located near Valley Forge, Pennsylvania. Mensch, one of the 6502 designers who went to MOS with Peddle, says this: “The environment was a small company where Mort Jaffe, John Paivinen, Don McLaughlin, the three founders, had created small teams of very capable calculator chip and system designers, a quick turn around mask shop and a high yielding large chip manufacturing team out of TI. So you go from Motorola with, relatively speaking, an unlimited budget for design and manufacturing, to an underfunded design team with very limited design tools for logic and transistor simulation. We had to manually/mentally simulate/check the logic and use very limited circuit simulation. In other words, it was really low budget. The datasheets and all documentation was done by the design team.”[2][3] Peddle persuaded Mensch and six other Motorola engineers — Harry Bawcom, Ray Hirt, Terry Holdt, Michael Janes, Wil Mathys, and Rod Orgill — to join him and a few others at MOS in designing and producing what became the MCS 6501/6502 chipset. “At MOS John Paivinen, Walt Eisenhower, and Don Payne, head of the mask shop, and mask designer Sydney Anne Holt completed the design and manufacturing team that created the high yielding NMOS depletion mode load process,” says Mensch. “The result was the MCS 6501/6502, 6530/6532 Ram, ROM Timer and IO combo and 6520/6522 PIA/VIA microprocessor family.”[3]

Some technical details of the 6502 are slightly fuzzy after so much time has passed — but I have chosen to focus on the 6502 because it is such a well-known processor, and at least some details are available. Semiconductor manufacturers are notoriously secretive, and it is hard to find detailed descriptions of how modern ICs are designed and manufactured. Whereas there are plenty of sources of information about the 6502.

(A word about the numbered notes: I don’t normally use such things, preferring instead a blogorrhific style of adding hyperlinks all over the place to point towards further information on various topics. But in this article, I have used notes to cite my sources a little more formally, for a few reasons. First, because there are inaccuracies about the 6502 floating about on the Internet, I’m trying to be a bit more careful. And since I’m not an expert in semiconductor manufacturing or economics, I feel like I have to point toward some specific accounts that back up my statements. Finally, a citation is a little more robust than a hyperlink in case an online publication becomes unavailable.)

EDN had a nice technical writeup of the 6502 in September 1975. BYTE magazine covered the 6502 in November 1975, with more of a focus on its instruction set than the physical aspects of the chip itself. Mind you, both these articles predated the use of the 6502 in any actual computer.

The manufacturing process for semiconductors is like printing newspapers. Sort of. Not really. Maybe more like the process for creating printed circuit boards. Well, at any rate, newspapers and printed circuit boards and semiconductors have these aspects in common:

  • Production requires a big complicated manufacturing plant with many steps.
  • Photolithography techniques are used to create many copies of a master original.
  • The master original requires creating content and layout that fits in a defined area.

Except that the semiconductor industry has been expanding for decades without any sign of letting up, whereas the newspaper industry has been struggling to survive in the age of the Internet.

Semiconductor manufacturing occurs in a fabrication plant or “fab”. The raw, unpackaged product is called a die (plural = “dice” or “dies” or “die”), and the master original is called a photomask set or mask set. Engineers cram a bunch of tiny shapes onto the photomasks in the mask set; each of the photomasks defines a separate step in the photolithography process and is used to form the various features of individual circuit elements — usually transistors, sometimes resistors or capacitors — or the conducting paths that interconnect them, or the flat squarish regions called “bonding pads” which are used to connect to the pins of a packaged chip. Ultrapure, polished circular semiconductor wafers are used; most often these are made of silicon (Si), but sometimes they consist of other semiconductor crystals such as gallium arsenide (GaAs), gallium nitride (GaN), silicon carbide (SiC) or a hodgepodge of those elements somewhere towards the right side of the periodic table: AlGaAsPSnGeInSb. These wafers are sawn as thin slivers from a monocrystalline boule, basically a big shiny circular semiconductor salami, which is typically formed by pulling a seed crystal upwards, while rotating, from molten material, using the Czochralski process, which is very hard for us to pronounce correctly.

The wafers have a bunch of die arranged in an array covering most of the wafer’s surface; these are separated into the individual die, and go through a bunch of testing and packaging steps before they end up inside a package with conductive pins or balls, through which they can connect to a printed circuit board. The packaged semiconductor is an integrated circuit (IC) or “chip”. The percentage of die on a wafer that work correctly is called the device yield. Die size and yield are vital in the semiconductor industry: they both relate directly to the cost of manufacturing. If chip designers or process engineers can reduce the die area by half, then about twice as many die can be fit on a wafer for the same cost. If the yield can be raised from 50% to 100% then twice as many die can be produced for end use, for the same cost. Yield depends on numerous processing factors, and gets worse for large die ICs: each specific manufacturing process has a characteristic defect density (defects per unit area), so a larger die size raises the chance that a defect will be present on any given die and cause it to fail.

Think of defects as bullets that kill on contact. The figure below shows three simulated circular wafers with 40 defects in the same places, but with different die sizes. There are fewer of the larger die, and because each die presents a larger cross-sectional area which is prone to defects, the yield ends up being lower.

The steps of the photolithography process are performed under various harsh environmental conditions — 1000° C, high pressure or under vacuum, and sometimes with toxic gases such as silane or arsine that often react violently if exposed to the oxygen in air — and generally fall into one of the following categories:

  • depositing atoms of some element onto the wafer
  • coating the wafer with photoresist
  • exposing the photoresist to light in a particular pattern (here’s where the photomasks come in)
  • etching away material
  • annealing — which is a heating/cooling process to allow atoms in the wafer to “relax” and lower crystal stress
  • cleaning the wafer

And through the miracle of modern chemistry, we get a bunch of transistors and other things all connected together.

The term “process” in semiconductor manufacturing usually refers to the specific set of steps that are precisely controlled to form semiconductors with specific electrical characteristics and geometric tolerances. ICs are designed around a specific process with desired characteristics; the same process can be used to create many different devices. It is not a simple manner to migrate an IC design from one process to another — this is an important contributor to today’s supply chain woes.

Let’s look at that photomicrograph again:

Photomicrograph © Antoine Bercovici (@Siliconinsid), reproduced with permission

The original 6502 manufactured in 1975 contained 3510 transistors and 1018 depletion-load pullups, in a die that was 0.168 inches × 0.183 inches (≈ 4.27mm × 4.65mm), produced on a 3” silicon wafer.[4] The process used to create the 6502 was the N-channel Silicon Gate Depletion 5 Volt Process, aka the “019” process. Developed at MOS Technology by Terry Holdt, it required seven photomasks, and consisted of approximately 50 steps to produce these layers:[5][6]

  1. Diffusion
  2. Depletion implant
  3. Buried contact (joining N+ to poly)
  4. Polysilicon
  5. Pre-ohmic contacts
  6. Metal (aluminum)
  7. Passivation (silicon dioxide coating)

You can see these layers more closely in higher-resolution photomicrographs — also called “die shots” — of the 6502. Antoine Bercovici (@Siliconinsid) and John McMaster collaborated on a project to post 6502 die shots stitched together on McMaster’s website, where you can pan and zoom around. (If you look carefully, you can find the MOS logo and the initials of mask designers Harry Bawcom and Michael Janes.) I think the most interesting area is near the part number etched into the die:

Photomicrographs © Antoine Bercovici (@Siliconinsid), stitched and hosted by John McMaster.
Annotations are mine. There is a small photo-stitching discontinuity between the top 1/3 and bottom 2/3 of this image.

The large squarish features are the bonding pads, and are connected to the pins of the 6502’s lead frame with bond wires that are attached at each end by ultrasonic welding, sometimes assisted by applying heat to the welding joint. (I got a chance to use a manual bond wiring machine in the summer of 1994. It was not easy to use, and frequently required several attempts to complete a proper connection, at least when I was the operator. I don’t remember much, aside from the frustration.)

The little cross and rectangles are registration marks, to align the masks and check line widths. The larger squares above them are test structures, which are not connected to any external pins, but can be checked for proper functioning during wafer probing.

Photomicrograph © Antoine Bercovici (@Siliconinsid), reproduced with permission.
This image is a composite of two photographs taken at the same 200× magnification: the 6502 die on the left, and a scale on the right with 10 microns per division (100 microns between major tickmarks).

The different layers have different visual characteristics — except for the depletion layer — in these images:

  • the silicon substrate is an untextured gray
  • the aluminum metal has a granular quality
    • it has a pinkish tinge when it has been covered by the passivation layer (most of the die)
    • when uncovered, as in the bonding pads and test pads, it is a more gray color
    • the small green dots represent contacts between metal and silicon
  • diffusion regions have a glassy look with discoloration around the edge
  • polysilicon shows up as light brown, except when it crosses through a diffusion region, where it is greenish and forms a MOSFET gate — Tada! instant transistor! — controlling whether current can flow between the adjacent diffusion regions. (Ken Shirriff has some more detailed explanations with images for some features of the 6502.)

How many chips are on a wafer? It’s hard to find that information for the 6502, but Wikipedia does have a description of the Motorola 6800:

In the 1970s, semiconductors were fabricated on 3 inch (75 mm) diameter silicon wafers. Each wafer could produce 100 to 200 integrated circuit chips or dies. The technical literature would state the length and width of each chip in “mils” (0.001 inch). The current industry practice is to state the chip area. Processing wafers required multiple steps and flaws would appear at various locations on the wafer during each step. The larger the chip the more likely it would encounter a defect. The percentage of working chips, or yield, declined steeply for chips larger than 160 mils (4 mm) on a side.

The target size for the 6800 was 180 mils (4.6 mm) on each side but the final size was 212 mils (5.4 mm) with an area of 29.0 mm². At 180 mils, a 3-inch (76 mm) wafer will hold about 190 chips, 212 mils reduces that to 140 chips. At this size the yield may be 20% or 28 chips per wafer. The Motorola 1975 annual report highlights the new MC6800 microprocessor but has several paragraphs on the “MOS yield problems.” The yield problem was solved with a design revision started in 1975 to use depletion mode in the M6800 family devices. The 6800 die size was reduced to 160 mils (4 mm) per side with an area of 16.5 mm². This also allowed faster clock speeds, the MC68A00 would operate at 1.5 MHz and the MC68B00 at 2.0 MHz. The new parts were available in July 1976.

The MOS Technology team seized the opportunity and beat Motorola to production with a depletion-load NMOS process (“regular” enhancement-mode N-channel MOSFETs acted as pull-down switches; depletion-mode N-channel MOSFETs were used as a load, with their gate and source tied together to act as a current source) in the 6502, which allowed the design team to achieve higher performance in a smaller die size.

Left: NAND gate with saturated enhancement-load NMOS. Right: NAND gate with depletion-load NMOS. In both cases, T1 acts as a current source, so that the output Y is low only if both A and B are high, but the depletion-mode version of T1 maintains strong current-source behavior at a higher output voltage than the enhancement-mode version, achieving faster rise times for the same static power dissipation. (Images courtesy of Wikipedia)

For the most part, design of the 6502 was paper-and-pencil, with some computer-assisted aspects of layout. Peddle was project leader, and focused on the business aspects; he also worked on the instruction set architecture — basically the abstract programmer’s model of how the chip worked, including the various opcodes — with Orgill and Mathys.[7]

To reduce this to a working circuit design, the 6502 team had to come up with a digital design of instruction decoders, arithmetic/logic unit (ALU), registers and data paths (high-level register-centric design) that could be implemented using individual gates made out of the NMOS transistors and depletion loads (low-level circuit design). Peddle, Orgill, Mathys, and Mensch worked out the register structure and other sections of the high-level design,[1 page 28][8] with Mathys translating a sequence of data transfers for each instruction into state diagrams and logic equations.[8] Mensch and Orgill completed the translation of the register-centric design from logic equations into a circuit schematic (technically known as the “650X-C Microprocessor Logic Diagram”[9]) of the NMOS transistors and depletion loads, annotated with dimensions, while Wil Mathys worked on verifying the logic.[10]

Mensch describes Orgill and himself as “semiconductor engineers”, responsible for reducing logic equations to transistor-level implementation in an IC to ensure that it meets speed, size, interface compatibility, and power specifications.[11] Orgill’s specialization was on the high-level architecture, contributing to the ISA, with “a focus on logic design and minimization”,[11] whereas Mensch had a predilection for low-level details. Mensch determined the design rules, ran circuit simulations on portions of the chip — limited to around 100 components at a time with the computation facilities available to MOS Technology in 1975 — and designed in the two-phase clock generator that would become the distinguishing factor between the 6501 and the 6502.[11][12 page 19] (The 6501 and 6502 shared all masks except for the metal layer, which had two slightly different versions: the 6501 left the two-phase clock generator disconnected so that it was pin-compatible with the Motorola 6800, whereas the 6502 connected the clock generator circuitry, breaking pin-compatibility. In 1976, MOS Technology agreed to cease production of the 6501 as a condition of a legal settlement with Motorola.[13])

Orgill and Mensch drew the schematic on mylar, using pencils with plastic lead[14] which could be erased. I found a book that described drafting on mylar this way:[15]

Drawing on mylar for the first time can be a scary experience — both for the novice designer and the company. The surface of mylar drafting film holds drawing lead much more loosely than the fibers in paper. If you were to draw on mylar with a regular graphite “lead” pencil, the disastrous results would be like drawing on a sheet of frosted glass with a charcoal briquette. You could form the lines, but they wouldn’t be very durable against smudges.

To compensate for this lack of adhesion, special plastic lead was developed specifically for use on mylar drafting film. Instead of being made from graphite, this “lead” is made of a soft waxy plastic compound. It comes in varying degrees of hardness just like regular drafting pencils. The softest designation is E0, and they progress in hardness with E1, E2, E3, etc.

Here is one section of the schematic, showing a section of the ALU; the dashed lines each surround one bit of the ALU.[16]

Photo of a portion of the Hanson copy of the 650X-C Microprocessor Logic Diagram, taken by Jason Scott.[17]

The annotations here include two types. The letters A-Z and AA-JJ, according to Mensch, denoted individual transistors for the purposes of checking correctness in the layout.[16] The numbers indicate transistor dimensions, in mils (thousandths of an inch), and are listed in two forms:[14]

  • A single number denotes NMOS gate widths, with a standard gate length of 0.35, the minimum used in this design
  • A pair of numbers W/L with a dividing line denotes gate width and length — current in the transistor is proportional to W/L, which determines how small and how fast each transistor is.

The transistor at the output of a gate is a depletion-mode pull-up, with the others as enhancement-mode transistor inputs[14] — so, for example, the NOR gate with transistors AA and Y as inputs had gate widths of 0.7 mil and length of 0.35 mil, and a depletion-mode pull-up of 0.3 mil width and 0.8 mil length. (In theory, someone could double-check this against Antoine Bercovici’s die photos of the 6502 rev A, by locating individual transistors and trying to find the corresponding transistors on the logic diagram… I have not, and leave this as an exercise for the industrious reader.)

The minimum gate length of 0.35 mil implies a technology node of 0.35 mil ≈ 8.9 micron for the 6502.

There are a few other interesting things visible from the schematic — the use of dynamic logic, for example. Anytime you see clock signals (ϕ1 and ϕ2 are the two-phase clock signals on the 6502) doing weird stuff, where some logic gate doesn’t have any driving input part of the time, you know you’ve got dynamic logic going on. (Wikipedia says “Dynamic logic circuits are usually faster than static counterparts, and require less surface area, but are more difficult to design.”) What caught my eye was the “T” and “B” on these AND gate inputs shown below:

Photo of a portion of the Hanson copy of the 650X-C Microprocessor Logic Diagram, taken by Jason Scott.[17]

I asked Mensch about this; he said they stood for “top” and “bottom”, specifically referring to the implementation of an AND or NAND gate in depletion-mode NMOS.[16] Here’s a transistor-level implementation of that pair of AND gates followed by a NOR gate:

Transistors Q1 (top) and Q2 (bottom) would correspond to one T/B pair of AND-gate inputs, and Q3 (top) and Q4 (bottom) the other. This matters because switching speed is different for the top and bottom MOSFETs — the top ones have drain-to-gate capacitance slowing down the switching (the Miller effect), whereas the bottom ones see a low-impedance load from the top transistors, forming a cascode configuration. As to why that is critical here, I’m out of my element — Mensch says the bottom transistor should be the first transistor to change state, and the last signal to change should be the top transistor[18] — but the point here is that digital logic design is not just a nice little abstraction layer with ones and zeros based on simple, identical, combinational logic and flip-flops. A lot of work went into choosing transistor sizes to get the 6502 to work fast under die size constraints.

The schematic also served as a rough layout known as a floorplan showing high-level placement, with the various gates arranged on the schematic roughly where Mensch thought they should go on the chip.[19] Bawcom, Holt, and Janes were mask designers for the 6502 chipset, taking the circuit design and placement and implementing them as individual transistors or resistors, made out of rectangular features sketched on various layers of Stabilene mylar film.

Image from Hoboken Historical Museum

The mask designers did not draw these features directly by hand — when I first started reading historical accounts of the 6502 for this article, I had a mental image of them sketching transistors on Stabilene one by one, fitting together like a puzzle until the last pieces were drawn in… and dammit, there’s only enough room for seven flip-flops, not eight, so they’d have to start over and try again. But that’s not how it worked. Instead, the design was based on “cells”, small reusable pieces of the design that could be planned separately and then fit into place in the layout, like Escher tesselations all coming together, or some kind of sadistic furniture floor plan where the room is full of tables and chairs and sofas with no space between them. Harry Bawcom, who previously worked on bipolar TTL layout at Motorola and was brought in to finish the 6800 microprocessor layout,[20] described cells this way:[21]

Cell design started with little stickies of transistors underneath clear mylar and you did the first pass with a grease pencil and a lot of iteration. That was a Bipolar technique that the MOS folks didn’t use. Probably why I was five times faster. By the time I picked up a pencil I knew where I was going.

According to Mensch, these physical representations of cells used in drafting were also called “paper dolls”, a term that shows up every now and then in accounts of that era. Joel Karp, the first MOS chip designer at Intel, also used this term describing the rather painstaking layout process for the Intel 1102 and 1103 1024-bit DRAM ICs.[22] Another account, from New York’s Museum of Modern Art, described a Texas Instruments logic chip layout from around 1976:[23]

At the time this plot was hand drafted, it was still possible to verify the design of individual components visually. To repeat a circuit element multiple times, an engineer would trace the initial drawing of the component, photocopy it onto mylar, then cut and glue it onto the diagram. The collage technique is referred to as “paper-doll layout.” Intended for use in a military computer, this particular chip was designed to sense low-level memory signals, amplify the signals to a specific size, and then store them in a memory cell for later recall.

But the early microprocessor designs at Motorola and MOS Technology were just starting to emerge from the manual-only world. Here the computer-assisted aspects came into play: for the 6502, someone at MOS captured each cell on the Stabilene film using a Calma GDS workstation and digitizer.[24][12 page 12] (Bawcom refers to this person as the “Calma operator” but says he “did not witness this process at MOS Technology.”[21]) Where possible, the Calma workstation was used to replicate cells that could be repeated in the design.[21]

Digitizer table for capturing IC design at MOS Technology, titled "DATA INPUT FOR AUTOMATIC MASK PREPARATION" on a MOS Technology brochure from 1970, courtesy of Diana Hughes and

The digitizer was a drafting table with a precision position sensor that could record x-y coordinates of any position on the table. The workstation was a Data General Nova minicomputer[25] with 5 megabyte hard drive and 16K RAM. The minicomputers at that time were created mostly out of standard logic chips (like the 7400 series) in DIP packages — each typically containing an array of 2-8 components like gates, registers, multiplexers, etc. — soldered onto circuit boards to make a processor and other associated sections. A cabinet-sized computer, rather than a room-sized mainframe. (If you haven’t read Tracy Kidder’s Soul of a New Machine, make a note to do so: it chronicles the design of the Data General Eclipse, the successor to the Nova.) The Calma GDS stored the layout design as polygons and could be used to draw the layout on a plotter, or to cut a photomask drawing out of a red film called Rubylith, also using the plotter, but with a precision blade used in place of a plotter pen.[12 page 12][26] Then the unwanted sections of Rubylith would be removed very carefully by hand during what MOS Technology engineers called a “peeling party”, according to Albert Charpentier.[26]

After a lot of very careful checking and revision, the set of Rubylith photomask drawings — shown in this picture from the August 25, 1975 edition of EE Times — were photographically reduced to a set of master glass reticles, one per mask, at 10 times actual size.[24] Each 10× reticle was used to reduce the design further, producing a 1:1 mask using a machine called a reduction stepper, which precisely locates multiple copies covering most of the 3-inch[24][27] wafer. In early production, contact or proximity masks were used,[12][28][29] but once MOS had been able to upgrade to four-inch wafers,[29] a Perkin-Elmer Micralign projection mask aligner[27][1] was used to scan the 1:1 mask bit by bit, using a clever symmetrical optical system, for lithography steps.[30]

The Micralign projection aligner was one of several reasons the 6502 team was able to succeed, by improving yields. (Remember: die size and yield are vital!) Motorola’s NMOS process yields were poor[31][32], giving them cost disadvantages. Mensch says that Ed Armstrong, Motorola’s head of process engineering at the time, grew out his beard, waiting to shave it until they were able to get 10 good die on a wafer.[12][19][10] The MOS team was able to get much higher yield than Motorola, in part by using a projection mask system: previous-generation lithography systems used contact masks, which touched the wafers and had limited durability. Motorola had used contact masks for the 6800.[1 page 22] From Perkin-Elmer’s Micralign brochure:[33]

Historically, the manufacture of integrated circuits involved placing the photomask directly in contact with the wafer during the exposure process. Repeated just a few times, this contact soon degraded the mask surface and the photoresist layer. Each defect that resulted was then propagated through the replication cycle. Consequently, masks were considered expendable, to be used between five and fifteen times and then discarded.

These problems led to several attempts at prolonging mask life. One was to make the photomask from harder materials that were more resistant to abrasion. Another was to reduce abrasion by reducing or even eliminating the contact force. These efforts did improve mask life to a limited extent, but neither was as effective as optically projecting the photomask image onto the wafer.

A second reason for the 6502’s higher yield was something MOS Technology referred to as “spot-knocking”[12 page 18], essentially a retouching of point defects in the masks.

The third reason for higher yields was through Mensch’s design rules — constraints on transistor size and feature spacing — which were conservative and much more tolerant of process variations,[19] a technique which he had learned on his own through experiences at Motorola, along with some lessons about what was and what wasn’t possible to achieve at the company.

Mensch’s first year at Motorola in 1971 was a rotation through four different departments: Applications, Circuit Design, Process Design, and Marketing.[34] At the Marketing department, his supervisor Dick Galloway asked him to put a quote together for IBM for memory chips over a seven year period, with pricing decreasing over time — a fairly complicated document, with lots of numbers that had to be typed accurately. So he decided rather than having a secretary type it up and go through the trouble of finding and correcting errors, he would write a FORTRAN program on the Motorola mainframe computer to take in parameters, plug in the numbers into some formulas, and print out the quote on a terminal with a thermal printer, which he then copied onto better paper. The Marketing staff asked him how he did it, and when he told them, Galloway said “Bill, we want you to work in the Design Group.” “Why is that?” “None of their chips work. We want you to work there. I think if you work there, the chips will work again.”[12][19]

As the new, inexperienced engineer in the IC design group, Bill Mensch’s introduction involved a lot of what the other engineers would call grunt work. Some of these efforts were to work on Motorola’s standard cell library in various MOS processes, and the process control monitor for memory and microcontroller designs.[24][12] The process control monitor (PCM) is a special set of test structures used to measure the parameters of basic circuit elements such as transistors, resistors, capacitors, and inverters — not only to make sure the manufacturing process is working as expected and check for statistical variation, but also to characterize these elements for simulation purposes. Nowadays it is typical to put those test structures in the scribe lines between ICs, since they can be so small, but in earlier IC designs the PCM is located in a few places on the wafer in place of the product, usually forming a plus-sign pattern of five PCMs. Early 6502 wafers from MOS Technology are — in 2022 at least — apparently nowhere to be found, but occasionally some later MOS wafers show up on eBay, and I did find a creator of “digital art”, Steve Emery at ChipScapes, who had a 4-inch Rockwell R6502 wafer, apparently from the mid 1980s (Synertek and Rockwell were both licensed by MOS Technology as a second-source for the 6502) on which you can see the PCMs. He was kind enough to take some photomicrographs of them for me:

Rockwell R6502 photograph © Steve Emery, reproduced with permission.

Ray Hirt designed the PCMs for the MOS 6502[24]; the Rockwell PCMs shown here are almost certainly not the ones Hirt designed in 1974-1975, but the overall concept is the same. The Rockwell R6502 has two different types, three of one type in the middle rows of the wafer, and two of another type in the top and bottom.

Rockwell R6502 photograph © Steve Emery, reproduced with permission.

The ones on the top and bottom look like an image resolution test on the various layers; there are no electrical connections:

Rockwell R6502 photograph © Steve Emery, reproduced with permission.

The three others have a bunch of circuit pads connected to various test elements:

Rockwell R6502 photograph © Steve Emery, reproduced with permission.

The PCMs that Mensch and Hirt designed included transistors of various dimensions, digital inverters, and ring oscillators. The inverter could be used to measure the input-output transfer function; the ring oscillator for measuring intrinsic time delays. The transistors typically included a minimum-size transistor (0.4 mil × 0.4 mil ≈ 10 μm × 10 μm in the early 1970s), and others with different widths and lengths, so that the parameters of the transistors could be characterized as a function of geometry.[2] In a 2014 interview, Mensch describes the PCMs during his early days at Motorola this way:[12]

We had to make some changes to model because of things I found. And I found that narrow transistors had a higher voltage threshold than a short one, and these are things that the memory product guys didn’t use. And so they had to change their design because of what I found on the process control monitor. I put very narrow transistors, very wide transistors, very large transistors, and very short transistors, so I knew the characteristics and what the actual sizing might have an effect on.

When I spoke with him in March, he described his experience a bit more candidly. As a young engineer at Motorola trying to learn the best way to design ICs, Mensch wanted to know what numbers to use for a transistor simulation model, so he asked around, and each of the design engineers had different numbers they used in their calculations; a typical exchange went like this:[2]

Mensch: Why are you using those numbers?

Engineer: Well, I just think it’s the right number.

Mensch: Yeah, but… but… who’s… who’s giving out the numbers? What temperature are you simulating it at?

Engineer: Well… at room temperature.

Mensch: Well, why room temperature?

Engineer: Well, that’s what we take the data on.

Mensch: Yeah, but you know it’s gonna run at 125°C, right? And minus 55, we need to get them to work that way.

Engineer: Yeah, but… whatever.

Mensch’s voltage threshold discovery — that gate threshold voltage on the same process varied with transistor geometry, and a good model would have to take this into account — was not immediately well-received; at first, engineers from the memory group didn’t believe him. He ended up sending around an inter-office memo to call a meeting (“MEETING TO PICK BILL’S SIMULATION NUMBERS”) and got everybody to attend by the happy accident of including Jack Haenichen on the cc: list of the memo. Haenichen was Motorola’s youngest vice-president, first elected in 1969 to become Vice President and Director of Operations, Services and Engineering in Motorola’s Semiconductor Products Division, at the age of 34; in early 1971 he was renamed to Director of Operations for MOS.[35][36][37] Haenichen had taken an interest in Mensch’s progress during his rotation in the Marketing department, and asked to be kept informed how things were going. As Mensch described it: “So this interoffice memo, everybody would see, ‘Hey, Jack’s on this list! Oh, we gotta show up.’ I never realized why all these people showed up at my meeting.” He eventually chose simulation parameters that were the worst case of all the other numbers.[2]

Over the next few years, an opportunity had begun to arise. Mensch was no longer a green engineer; by 1974, he had designed the 6820 Peripheral Interface Adapter, and he and Rod Orgill had worked together on design teams for two microprocessors at Motorola — the 5065, a custom microprocessor for Olivetti, and the Motorola 6800.[3] Mensch also had designed the PCM for the 6800, and put in test structures not only for the enhancement-load process of the 6800, but also for a depletion-load process, all ready to help prove out the superiority of the concept, just by making a slight change in the masks and the processing steps.[12] Meanwhile, Chuck Peddle had joined Motorola, and in 1974 was traveling the country giving seminars on the 6800 for prospective customers, who were very interested, but not at the price Motorola was offering. Peddle wanted to pursue a lower-cost version of the 6800.[1] Motorola had advantages in financial resources; the company’s 1972 Annual Report stated proudly that its revenues exceeded a billion dollars for the first time, and “Metal-oxide-semiconductor (MOS) integrated circuit sales for Motorola during ’72 grew at a faster rate than the world industry, whose growth was an estimated 60-70%.”[38] In 1973’s Annual Report, it stated \$1.437 billion in revenue, with the company’s Semiconductor Products Division reporting revenue “up more than 45% over the previous year”, and expressed an optimistic view of the microprocessor market:[39]

The burgeoning microprocessor market is presenting the industry with a radical opportunity to engineer into electronic systems significant benefits not previously possible. The true extent to which microprocessors will be adopted is not yet apparent, even though the current picture indicates a possibly phenomenal market whose growth rate could eclipse that of today’s fastest growing semiconductor categories. Motorola has a major commitment to the microprocessor market, and we intend to secure a significant share. Development in this area has reached an advanced stage.

Motorola had already been in the electronics business for decades — starting with car radios in 1930 and getting into the semiconductor market with mass-production of germanium power transistors in 1955 — with a well-established sales and distribution network. It had the tools and staff to design and manufacture cutting-edge microprocessors.

So why was the low-cost 8-bit microprocessor a project at MOS Technology instead of Motorola?

The Elephant and the Hare

I have struggled to understand: Why not at Motorola? Motorola had all these resources, and an opportunity to follow up on the 6800, but at first glance appears to have squandered the opportunity.

Motorola and MOS Technology were two very different companies. In Motorola’s case, being a large company gave it significant long-term advantages, in the form of product diversity — Motorola was nearly a self-contained “supermarket” for the circuit designer, with discrete, analog, and digital ICs, so it benefited from many market trends in electronics — and inertia. Its size allowed Motorola some freedom to “coast”, when necessary, on its past successes. MOS Technology was small and agile, and had to survive by being competitive in a few specific areas like MOSFET-based IC design and manufacturing technology. A business failure of a few million dollars would have been a minor setback for Motorola, but a mortal wound for MOS.

A 1970 ad campaign describes “Motorola’s Ponderous Pachyderm Syndrome”,[40] something that seems like incredibly poor marketing:

Haenichen described in an interview:[41]

Motorola, at the time, was called the “Ponderous Pachyderm” by the industry people. In other words, we maybe were not the “latest and greatest” but when we started making something, we wiped everybody out, because we just made them by the billions — that was our reputation, slow moving but good.

Yeah, um… okay. I get the idea. Take a little longer and become a dominant player in the industry… sure. But a “ponderous pachyderm” as high-tech corporate metaphor? Not exactly the most inspiring.

And yet, if we fast-forward to the 1980s and 1990s: Motorola did find success in its microprocessor offerings, reaching its zenith a few years after the 6800 and its follow-up, the 6801 — in the form of the 68000 series, which were produced roughly from 1979 - 1994 and used in many systems, notably the Apple Macintosh. And later 6800-series ICs like the 68HC11 took a prominent position in the microcontroller market.

Even by early 1980, the 6800 and 6809 achieved market success. While looking for historical pricing information in Byte Magazine’s January 1980 issue, I came across several ads for third-party systems and software tools for the 6800 and 6809. The chip distributor ads in the back of the magazine listed various microprocessors, almost all in the \$10 - \$20 range, including the Zilog Z80, the 6502, the 6800, RCA’s CD1802, and Intel’s 8080. Motorola had been able to lower the cost of the 6800.

But 1974 was a different story. With a major economic recession looming, Motorola’s Semiconductor Products division turned more risk-averse, and focused on getting the 6800 out the door successfully. Mensch, who had worked on the 6800’s process control monitor, and snuck in a depletion-load version in addition to the normal enhancement-load PCM, was pushing to have one wafer ion-implanted to try out the depletion-load process. When he talked to Armstrong (head of process engineering) he was finally told why they wouldn’t let him investigate depletion-mode: “We were afraid you wouldn’t complete the designs with enhancement mode.”[19]

Tom Bennett, who led the chip design of the 6800, described relying on depletion loads as “a little risky”:[31]

We did the ion-implant only of the substrate. There was an extra one or two process steps to do the depletion load. And it was determined that, you know, that might be a little risky. That’s why we went to all these other, you know, hardware extremes to get around that. And so we compensated for it with design.

Given the process problems Motorola was having with just getting enhancement mode NMOS to work, perhaps this was the right decision for Motorola after all.

Internal politics and friction also hampered the 6800 project.[42] The sense I get, in talking to Bill Mensch and reading other accounts of the 6502 and 6800, was that at Motorola, getting things done depended on being on good terms with other staff and with managers — the old adage, it’s not what you know, it’s who you know. Bill Lattin, a member of the Motorola design team interviewed by the Computer History Museum in 2008, described this environment’s effect on the 6800 this way:[43]

Well, the amazing thing is that it succeeded as well as it did. Having gone to Intel and seeing a very — a company that does a very structured strategic plan every year, and knows where to focus the resources, Motorola was a bottoms-up. A strength of an idea would get sold, and or Doug Powell would get it and he would push it. And then Tom would get it, and convince everybody, you know, we want to work on that. And it was, you know, having now been in management, and looking back I kind of say, “What could have happened here with 6800 had there been strategic direction from the whole company, you know, moving down this way?” And so it was a phenomenal success. I’m privileged to have worked with really bright guys pulling it off, and against chaos that was put in everybody’s way.

The 6800 team succeeded in getting management buy-in; Peddle and Mensch did not with their low-cost microprocessor. But Peddle joined the 6800 team fairly late in the project, and Mensch was a junior engineer learning the hard way that technical merit was not enough.

Aside from the recessionary climate and internal politics, there is one more significant reason that dampened Motorola’s eagerness towards pursuing microprocessors. Being a large, diversified company sometimes presented a conflict of interest between Motorola’s different semiconductor groups. Even the 6800 team faced this: each market opportunity for the company to sell an integrated microprocessor like the 6800 would compete against circuit-board-level processors designed with less-integrated logic chips. Within Motorola, that would mean less business selling standard logic chips, and among Motorola’s customers, it might put some of the minicomputer companies’ designs at risk.[44] From the Computer History Museum’s 2008 interview:[43]

Bennett: And interesting. The only one that really asked some questions which I thought were important was Bob Galvin. And his comment was when he looked at it he says, “You understand that you’re putting our customer’s chip — or system — on one of these little boards?” He said, “What’s that gonna do to my other products?” But that’s where it was at that point in time. The other thing…

Ekiss: Yeah, HP really recognized that, because I had called on them as a customer, and they quizzed me up and down about the implications for the semiconductor company to be able to make products like this. Because we were now right on their turf.

Laws: We did the recording of the 8008 oral history several months ago, and listening to the tremendous battles that went on in Intel between the memory people who were terrified that processor people were going to be treading on their [customer’s] turf and taking away their business. So it was not unique to Motorola.

There’s a gap in the historical record here: it would be nice to find a well-reasoned explanation from Motorola’s management why they told Peddle to stop working on a low-cost microprocessor.[45] But I will hazard a guess: just consider that in 1974, Motorola was still trying to bring the 6800 to production so they could start selling it to make some money back on their investment — John Ekiss related that they had been relying on income from large customers like National Cash Register, who’d been buying ROMs, to fund new engineering efforts[43] — and here’s this guy Peddle who’s been at Motorola for less than a year, squawking about how they need to sell a lower-cost processor, which would potentially compete with the 6800 that hadn’t even been released yet, and which would earn Motorola less profit per processor sold. Oh, and yes, there’s a recession going on. So please, Chuck, stop it and help us sell the 6800.

Sometime around early 1974, management announced that the microprocessor group would be moving from Mesa, Arizona to Austin, Texas — an unpopular decision — and at about that point Peddle proposed jumping ship.[1][12][43]

In one presentation, Mensch alludes to Star Wars, describing the eight departing Motorolans as “Rebels” leaving the Motorola “Empire”,[24] going to MOS Technology instead, in order to make their vision into reality. In some ways it is not surprising that they succeeded. Working with fewer resources and fewer people, the team had to be creative to make things work, but fewer people can also be an asset. In The Mythical Man-Month, Rodney Brooks talks about the concept of “conceptual integrity”, of having a unified design: “I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas.”

Mensch related how Stephen Diamond asked him about the oscillator section of the 6502:[10]

And he says, “Bill, didn’t you have trouble with that?”

I go, “No, why, why do you ask? It was just, you know, you just know what the edges needed, so that the feedback from like the X register from the output to the input when you’re loading a new value, you want to not lose what’s in there while, you know, because you don’t want to feed back the output from it or you won’t load the new value right,” and I said, “so that we all knew what the timing was.”

But he says, “Well, we had to struggle with that and we had all kinds of—“

I said, “Oh, I think I know why. How many engineers did you have on it?”

“Oh, I don’t know, maybe 20.”

I go, “That’s the problem. We didn’t have that many engineers, so… ours worked.”

Mensch describes how leaving Motorola and getting to MOS allowed him and Rod Orgill to figure out how to design what t

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.

Sign up

I agree with the terms of use and privacy policy.

Try our occasional but popular newsletter. VERY easy to unsubscribe.
or Sign in