Use a Simple Microprogram Controller (MPC) to Speed Development of Complex Microprogrammed State Machines

Michael MorrisApril 18, 20152 comments


This article will describe a synthesizable HDL-based microprogram controller (MPC), or microprogram sequencer (MPS), that can be used to provide the control of a microprogrammed state machine. Unlike the microprogrammed state machines that I described in my previous two articles, "Use Microprogramming to Save Resources and Add Functionality" and "Fit Sixteen (or more) Asynchronous Serial Receivers in the Area of a Standard UART", many microprogrammed state machines will benefit from using a common control structure, an MPC, that provides the basic housekeeping functions and the features needed to provide the behaviors needed by state machines. Using an MPC module allows you to focus on the design and implementation of the state machine instead of microprogram control logic. The structure of the control logic represented by an MPC indirectly influences the implementation of a state machine, but the control logic itself does not directly contribute to the implementation of the required state machine.

The remainder of this article will describe the HDL reimplementation of a microprogram controller that I have used in my more recent and more complex microprogrammed designs. My focus will be on the basic features/capabilities needed to implement complex state machines such as Digital Signal Processing (DSP), Proportional plus Integral plus Differential (PID) controllers, Internet Protocol (IP) off-load engines, microprocessors/microcomputers, etc.

Desirable Microprogram Controller Features

What are the features needed in a microprogram controller that will make it a useful component for the design and implementation of microprogrammed state machines?

In the first article, I implemented the next state logic using a field within the microprogram word. That field directly provided the address of the next state to be used. I also used a custom-designed priority encoder to map the various requests into addresses in the microprogram ROM of the state sequences used to service each particular request. The priority encoder was enabled whenever the microprogram next state field was set to the address of the idle state, and its output was used as the next state of the state machine. This simple scheme proved easy to implement, but mapping of the requests to specific microprogram addresses made moving and/or expanding the target microprogram sequences tedious and error prone.

In the second article, I reused another block RAM receive/transmit FIFO as described in the my first article.  Expanding that FIFO's depth from 256 x 8 to 1024 x 8 was easily handled by expanding the microprogram memory width and the widths of the FIFO's components. The project also used a second, time-multiplexed microprogrammed state machine for implementing the state machine of an asynchronous serial receiver. Like the FIFO's microprogram, the next state control microprogram in the asynchronous serial receiver's microprogrammed state machine also used direct mapping of the next state address using a dedicated field in the microprogram word. Also included was a conditional two-way branch capability in order to deal with the number of stop bits.

The biggest benefit of the implementation of these two microprogrammed state machines was the simplicity and compactness of the next state logic: direct mapping. However, from a microprogramming perspective, the next state control logic for these microprogrammed state machines was difficult to easily expand beyond the small number of states used in those implementations. The housekeeping needed to manage the coupling between the external logic circuits which selected the addresses of the various control sequences in the microprogram ROM and the microprogram rapidly becomes unwieldy using the next state control methodology employed in those two designs.

Therefore, the most desirable feature for an MPC would be a means for easily describing sequential and non-sequential state sequencing. This capability is readily provided by a loadable counter: a microprogram address counter. The second most desirable feature would be the ability to perform conditional and non-conditional branching. This feature is a common function in the microprogram controller designs described in my first two articles. The feature was implemented using a priority encoder in the case of the FIFO design, and using multi-way branching in the case of the asynchronous serial receiver design. Thus, both branching mechanisms should be available in an MPC . Subroutines provide a convenient way to reuse control sequences in programs. Microprogrammed state machines can also benefit from reusing common control sequences. Thus, the final desirable feature in an MPC is a way implement microprogram subroutines. The following list summarizes the most desirable features of an MPC:

  • Support for sequential and non-sequential microprogram execution
  • Support for conditional and non-conditional branching of microprogram execution
  • Support for microprogram subroutines

An HDL Microprogram Controller

Fairchild Semiconductor offered a number of components in its Macro Logic family that supported the development of microprogrammed controllers and processors. One component, in particular, the 9408 Microprogram Sequencer, has been of particular use to me. I've re-implemented that component in Verilog and used for several projects. It provides all three of the features that I listed above. As a consequence, the HDL version of the 9408 is used to implement state machines of varying levels of complexity. The simplest implementation has been a controller to process the EnDat position encoder protocol, and the most complex is a controller that provides a UDP/IP based 1000BaseTX interface for the control of a two axis industrial positioner. I've also used it to re-implement and extend the 65C02 microprocessor.

Having a microprogram controller represented as an HDL model provides ample opportunities for modification of the model. I often modify the HDL model of the 9408 provided below. For example, I frequently modify the source to implement only the pipelined version of the 9408 model. Another frequent modification that I make is the elimination of the microsubroutine capability when it's not required. For example, in my re-implementation of the 65C02 microprocessor linked to above, I reduced the microsubroutine stack depth from 4 to 1 just to retain to capability should it be needed. In addition, I included a microcycle length controller directly into the MPC used in that project to allow the resulting microprocessor to more accurately emulate the behavior of the two phase bus cycle of the original devices.

The following source code provides the original version of the HDL 9408 MPC that I created in October 2009. As shown below, the module has one parameter which determines the width of the address to the microprogram memory. The module supports all of the features of the Fairchild 9408 with the exception of the latches on the test inputs. The Additional Comments section of the module's header discusses the reason for not including those latches/registers on the four test inputs.

A simple incrementer provides the means for sequential execution of a microprogram. The single case statement near the end of the module provides a clear definition of the actions of the four bit 9408 instruction. One instruction is used for sequential execution, FTCH, two instructions support microsubroutines, one instruction supports 8-way branching, four instructions support unconditional branching, and 8 instructions support conditional branching (using the four test inputs). 

`timescale 1ns / 1ps
// Company:         M. A. Morris & Associates
// Engineer:        Michael A. Morris
// Create Date:     10/30/2009 
// Design Name:     F9408 Synthesizable Microprogram Sequencer
// Module Name:     C:\XProjects\ISE10.1i\F9408\F9408_MPC.v
// Project Name:    C:\XProjects\ISE10.1i\F9408
// Target Devices:  N/A 
// Tool versions:   Xilinx ISE 10.1i SP3
// Description:
// This module implements a simple microprogram sequencer based on the Fair-
// child F9408. The sequencer provides:
//          (1) 4-bit instruction input
//          (2) four-level LIFO stack;
//          (3) program counter and incrementer;
//          (4) 4-bit registered test input;
//          (5) 8-way multi-way branch control input;
//          (6) branch address input;
//          (7) 4-way branch address select output;
//          (8) next address output.
// These elements provide a relatively flexible general purpose microprogram
// controller without a complex instruction set. The sixteen instructions can
// be categorized into three classes: (1) fetch, (2) unconditional branches,
// and (3) conditional branches. The fetch instruction class, a single instruc-
// tion class, simply increments the program counter and outputs the current
// value of the program counter on the next address bus. The unconditional 
// branch instruction class provides instructions to select the next instruc-
// tion using the Via[1:0] outputs and output that value on the next address
// bus and simultaneously load the program counter. The unconditional branch
// instruction class also provides for 8-way multiway branching using an exter-
// nal (priority) encoder/branch selector, and microprogram subroutine call and 
// return instructions.
// The instruction encodings of the F9408, as provided in "Principles of Firm-
// ware Engineering in Microprogram Control" by Michael Andrews. The instruc-
// tion set and operation map for the implementation is given below:
//  I[3:0] MNEM Definition       T[3:0]      MA[m:0]      Via  Operation
//   0000  RTS  Return            xxxx      TOS[m:0]       00  PC<=MA;Pop
//   0001  BSR  Call Subroutine   xxxx       BA[m:0]       00  PC<=MA;Push
//   0010  FTCH Next Instruction  xxxx        PC+1         00  PC<=MA[m:0]
//   0011  BMW  Multi-way Branch  xxxx  {BA[m:3],MW[2:0]}  00  PC<=MA[m:0]
//   0100  BRV0 Branch Via 0      xxxx       BA[m:0]       00  PC<=MA[m:0]
//   0101  BRV1 Branch Via 1      xxxx       BA[m:0]       01  PC<=MA[m:0]
//   0110  BRV2 Branch Via 2      xxxx       BA[m:0]       10  PC<=MA[m:0]
//   0111  BRV3 Branch Via 3      xxxx       BA[m:0]       11  PC<=MA[m:0]
//   1000  BTH0 Branch T0 High    xxx1  {T0?BA[m:0]:PC+1}  00  PC<=MA[m:0]
//   1001  BTH1 Branch T1 High    xx1x  {T1?BA[m:0]:PC+1}  00  PC<=MA[m:0]
//   1010  BTH2 Branch T2 High    x1xx  {T2?BA[m:0]:PC+1}  00  PC<=MA[m:0]
//   1011  BTH3 Branch T3 High    1xxx  {T2?BA[m:0]:PC+1}  00  PC<=MA[m:0]
//   1100  BTL0 Branch T0 Low     xxx0  {T0?PC+1:BA[m:0]}  00  PC<=MA[m:0]
//   1101  BTL1 Branch T1 Low     xx0x  {T1?PC+1:BA[m:0]}  00  PC<=MA[m:0]
//   1110  BTL2 Branch T2 Low     x0xx  {T2?PC+1:BA[m:0]}  00  PC<=MA[m:0]
//   1111  BTL3 Branch T3 Low     0xxx  {T3?PC+1:BA[m:0]}  00  PC<=MA[m:0]
// Dependencies: 
// Revision: 
//  0.01    09J30   MAM     File Created
//  1.00    10G10   MAM     Stack Pop operation modified to load StkD register
//                          with 0 during subroutine returns. This will force
//                          the microprogram to restart at 0 if the stack is
//                          underflowed, i.e. POPed more the 4 times. Also made
//                          a change to the Stack Push operation so that Next
//                          is pushed instead of MA.
//  1.01    10G24   MAM     Corrected typos in the instruction table.
//  1.02    10G25   MAM     Removed Test Input Register, Strb input, and Inh
//                          output. External logic required to provide synchro-
//                          nized inputs for testing.
//  2.00    10H28   MAM     Converted the BRV3 instruction into a conditional
//                          branch to subroutine instruction. In this way the
//                          BRV3, or CBSR, instruction can be used to take a
//                          branch to an interrupt subroutine. The conditional
//                          subroutine call is taken if T[3] is a logic 1. Like
//                          the BSR instruction, the address of the subroutine
//                          is provided by BA field.
//  2.10    11C05           Simplified return stack implementation. Removed
//                          unused code, but retained code commented out that
//                          reflects original implementation of BRV3 instruc-
//                          tion.
//  2.11    11C20           Removed CBSR modification
//  3.0     11C21           Changed module and added support for pipelined op-
//                          eration per the connections of the original F9408.
//                          Included an internal Reset FF stretcher to insure
//                          that an external registered PROM has time to fetch
//                          the first microprogram word. Removed the MA_Sel
//                          input because really should have been module reset.
//                          Without tying MA to 0 with the internal reset, the
//                          module in pipelined mode was not executing the same
//                          microprogram as non-pipelined mode module and that
//                          was unexpected. With these changes, the module per-
//                          forms identically to the original F9408 MPC.
// Additional Comments: 
//  Since this component is expected to be used in a fully synchronous design,
//  the registering of the Test inputs with an external Strb signal and the Inh
//  signal is not desirable since it puts another delay in the signal path. The
//  effect will be to decrease the responsiveness of the system, and possibly
//  require that the test inputs be stretched so that pulsed signals are not
//  missed by the conditional tests in the microprogram. In the partially
//  synchronous design environment in which the original F9408 was used, incor-
//  porating a register internal to the device for the test inputs was very 
//  much a requirement to reduce the risk of metastable behaviour of the micro-
//  program. To fully support the test inputs, the microprogram should include
//  an explicit enable for the test input logic in order to control the chang-
//  ing of the test inputs relative to the microroutines.
//  This module can operate in either non-pipelined or pipelined mode as select-
//  ed by the Pipeline Mode Select, PLS, input signal.
module F9408A_MPC #(
    parameter pAddrWidth = 8            // Original F9408 => 10-bit Address
    input   Rst,                        // Module Reset (Synchronous)
    input   Clk,                        // Module Clock
    input   [3:0] I,                    // Instruction (see description)
    input   [3:0] T,                    // Conditional Test Inputs
    input   [2:0] MW,                   // Multi-way Branch Address Select
    input   [(pAddrWidth-1):0] BA,      // Microprogram Branch Address Field
    output  [1:0] Via,                  // Unconditional Branch Address Select
    input   PLS,                        // Pipeline Mode Select
    output  reg [(pAddrWidth-1):0] MA   // Microprogram Address
//  Local Parameters
localparam RTS  =  0;   // Return from Subroutine
localparam BSR  =  1;   // Branch to Subroutine
localparam FTCH =  2;   // Fetch Next Instruction
localparam BMW  =  3;   // Multi-way Branch
localparam BRV0 =  4;   // Branch Via External Branch Address Source #0
localparam BRV1 =  5;   // Branch Via External Branch Address Source #1
localparam BRV2 =  6;   // Branch Via External Branch Address Source #2
localparam BRV3 =  7;   // Branch Via External Branch Address Source #3
localparam BTH0 =  8;   // Branch if T[0] is Logic 1, else fetch next instr.
localparam BTH1 =  9;   // Branch if T[1] is Logic 1, else fetch next instr.
localparam BTH2 = 10;   // Branch if T[2] is Logic 1, else fetch next instr.
localparam BTH3 = 11;   // Branch if T[3] is Logic 1, else fetch next instr.
localparam BTL0 = 12;   // Branch if T[0] is Logic 0, else fetch next instr.
localparam BTL1 = 13;   // Branch if T[1] is Logic 0, else fetch next instr.
localparam BTL2 = 14;   // Branch if T[2] is Logic 0, else fetch next instr.
localparam BTL3 = 15;   // Branch if T[3] is Logic 0, else fetch next instr.
//  Declarations
wire    [(pAddrWidth - 1):0] Next;        // Output Program Counter Incrementer
reg     [(pAddrWidth - 1):0] PC_In;       // Input to Program Counter
reg     [(pAddrWidth - 1):0] PC;          // Program Counter
reg     [(pAddrWidth - 1):0] A, B, C, D;  // LIFO Stack Registers
reg     dRst;                             // Reset stretcher
wire    MPC_Rst;                          // Internal MPC Reset signal
//  Implementation
always @(posedge Clk)
        dRst <= #1 1;
        dRst <= #1 0;
assign MPC_Rst = ((PLS) ? (Rst | dRst) : Rst);
//  Implement 4-Level LIFO Stack
always @(posedge Clk)
        {D, C, B, A} <= #1 0;
    else if(I == BSR)
        {D, C, B, A} <= #1 {C, B, A, Next};
    else if(I == RTS)
        {D, C, B, A} <= #1 {{pAddrWidth{1'b0}}, D, C, B};
//  Program Counter Incrementer
assign Next = PC + 1;
//  Generate Unconditional Branch Address Select
assign Via = {((I == BRV2) | (I == BRV3)), ((I == BRV3) | (I == BRV1))};
//  Generate Program Counter Input Signal
always @(*)
    case({MPC_Rst, I})
        RTS     : PC_In <=  A;
        BSR     : PC_In <=  BA;
        FTCH    : PC_In <=  Next;
        BMW     : PC_In <=  {BA[(pAddrWidth - 1):3], MW};
        BRV0    : PC_In <=  BA;
        BRV1    : PC_In <=  BA;
        BRV2    : PC_In <=  BA;
        BRV3    : PC_In <=  BA;
        BTH0    : PC_In <=  (T[0] ? BA   : Next);
        BTH1    : PC_In <=  (T[1] ? BA   : Next);
        BTH2    : PC_In <=  (T[2] ? BA   : Next);
        BTH3    : PC_In <=  (T[3] ? BA   : Next);
        BTL0    : PC_In <=  (T[0] ? Next : BA  );
        BTL1    : PC_In <=  (T[1] ? Next : BA  );
        BTL2    : PC_In <=  (T[2] ? Next : BA  );
        BTL3    : PC_In <=  (T[3] ? Next : BA  );
        default : PC_In <=  0;
//  Generate Microprogram Address (Program Counter)
always @(posedge Clk)
        PC <= #1 0;
        PC <= #1 PC_In;
//  Assign Memory Address Bus
always @(*)
    MA <= ((PLS) ? PC_In : PC);


This article has provided an introduction to the concept of a microprogram controller, and followed that with the presentation of a working HDL (Verilog) model of a microprogram controller. For those with additional interest in this topic, I've used a modified version of the MPC described above to implement a version of the 65C02 microprocessor. That implementation closely mimics the instruction set of the Rockwell R65C02 and the Western Design Center (WDC) WDC65C02S microprocessors, but it also reduces the number of memory cycles required to implement many of the instructions. In that regard, my implementation most closely resembles that of the relatively unknown 65SC02 microprocessor.

I think that the microprogrammed microprocessor demonstrated in that project represents a fairly typical use of the Fairchild 9408 MPC. Although implementation does not use any microsubroutines or conditional tests of the test inputs, T[3:0], it does demonstrate how to effectively use the multi-way branch, unconditional branch, and sequential execution instructions of the Fairchild 9408 to implement the complex addressing modes and instructions of the 6502/65C02 instruction set architecture.

I welcome any positive comments or questions regarding the subject of this article, or previous related articles.

What's Next

In my next article I will discuss ways in which the MPC shown above can be used to implement various state machines.

© 2009-2015 Michael A. Morris, All Rights Reserved. 

Next post by Michael Morris:
   Use DPLL to Lock Digital Oscillator to 1PPS Signal


[ - ]
Comment by Passing byJune 14, 2015
I'm quite fond of micro-coded designs as well !
It is a very efficient method for implementing complex operations.

Sometimes, the operations cannot be paralleled (for example communications over a bus), so there is no benefit in implementing separate hardware blocks, that cannot be active simultaneously, thus wasting resources.

I use VHDL files with embedded microcode assembly, which is converted by a small Ruby script into initialization constants. The Ruby script most complex work is to calculate addresses for labels and branches.

The "instruction set" of the microcode depends on the target application, there is no re-used sequencer, no standard set of registers...

For examples, look for "TEMLIB". There is a micro-coded Ethernet controller and SCSI disk imitation.
[ - ]
Comment by pankaj2701October 16, 2015
Nice illuminating article on Microprogramming Controller, especially for those engineers who graduated in the last 15 years. Eagerly waiting for your next article.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.

Sign up
or Sign in