image of READY prompt

Pre-history (link)

Before getting into the 2200 machines, a very short run up to what preceded it is in order. To get a much more in-depth understanding, read Rick Bensene's excellent article about Wang's early systems.

By the mid 1960's, Wang started making a name for itself in the electronic calculator business. The LOCI-2 machine lead the way, with the 300 series and the 700 series being the blockbuster families. Wang also had 100, 400, 500, 600 series machines that appear to have been mainly derivations from the 300 and 700.

The 700 series itself had an interesting background. Originally Wang was intended to produce a "real" CPU, but when HP came out with their still amazing HP 9100 calculator, Wang recognized that the 300 family was doomed and abruptly redirected the CPU team to produce the 700 series programmable calculators.

The next attempt at a CPU was internally called the 800 CPU, and some of the old documents reference it, but it ultimately became known as the 2200 CPU. In some ways, the microarchitecture of the 2200 CPU was very similar to the microarchitecture of the 700 CPU. Specifically, the machine had a 4b ALU that was fed from an A bus and a B bus; a read from memory received 8b of data in the C register; a write to memory sent only 4b of data; the I/O bus was 8b wide, composed of the high and low 4b nibbles of the K register. There were also differences: 16b microinstruction address vs. 12b; eight nibble register file instead of three; 16 word auxiliary register stack vs none; tight 20b vertically encoded microword vs 43b horizontally encoded microword; 1.6 uS cycle time vs 1.25 uS (yes, the 2200 was slower).

Origin of the 2200 (link)

The 2200 project was started in 1970, with Bob Kolk as the project lead. Some of the other team members were Bruce Patterson, Dave Angel, Joe Wang, and Horace Tsaing. Bruce Patterson did much of the microcode; Dave Angel coded up the $GIO and $TRAN instructions; presumably the other members designed the hardware. The 2200 was first available for sale in May, 1973.

The 2200 was designed just before the era of integrated microprocessors, such as the 8080 and 6502. Since the designers didn't have such useful building blocks, they had to design their own CPU out of hundreds of elementary 14 and 16 pin chips (mostly 7400-series TTL parts). At the heart of the CPU was a single 4b 74181 ALU bit slice. Some people have made a big deal of this, calling it "BASIC executed in hardware," but really the CPU is just a instruction set processor implemented in TTL and the 42.5 KB of microcode is akin to the BASIC interpreter of a conventional microcomputer. Another reason for viewing this as a CPU and not a micromachine is that there is no overlap of instruction execution and no hazards from one microinstruction to the next (jargon free: all side effects from microinstruction N take effect before microinstruction N+1 begins execution).

Although the ALU is only 4b wide, the power of this system is roughly equivalent to a 2 MHz 8080 CPU due to the fact that microinstructions often do more than one thing. The 1.6 usec (16 cycles at 10 MHz) cycle time of the 2200 is roughly the same as the 1.5 - 2 usec (3 or 4 cycles at 2 MHz) memory access of an 8080.

2200 Microarchitecture (link)

The following information is intended to give the flavor of the microarchitecture, but doesn't cover everything. Most of this information was derived from the 2200 Systems Maintenance Manual, although it is vague enough that some of this was figured out by reading between the lines and studying the actual 2200T microcode.

The view of the CPU presented to the microprogrammer is as follows.

Table 1: Wang 2200 CPU Register Resources
Register name [array] Width Function
IC 16b microcode instruction counter
ICSTACK[16] 16b microcode return stack
PC (PC4, PC3, PC2, PC1) 16b (4b, 4b, 4b, 4b) memory address pointer; scratch register
AUX[16] 16b auxiliary PC file
F[8] 4b nibble data registers
C (CH, CL) 8b (4b,4b) memory read data
K (KH, KL) 8b (4b,4b) 8b data to/from the I/O bus
ST1 4b status register 1
ST2 4b status register 2
ST3 4b status register 3
ST4 4b status register 4

IC points at the current microinstruction being executed. Each microinstruction is 20b wide. Every microinstruction takes sixteen 10 MHz clock cycles, or 1.6 usec per microinstruction. The IC can be loaded with a 16b immediate value (i.e., JUMP or CALL); its value can be saved on the next location in the ICSTACK or its value restored from the same; its value can be transferred to/from the PC.

ICSTACK holds return addresses from the microcode subroutine calls. The stack is 16 deep; if the call nesting gets deeper than 16 levels, the ICSTACK pointer just wraps around and overwrites the oldest entry. In order to keep the implementation simple, a subroutine call pushes the current IC on the ICSTACK, and the corresponding subroutine return pops that same value back into the IC. In neither the push nor the pop does the IC address get incremented. The problem with this is that the subroutine call would just be executed again after the subroutine return. However, the subroutine return instruction sets a flag which causes the next microinstruction (which invariably is the subroutine call) to be ignored. This means that effectively a return from subroutine instruction takes two microcycles, or 3.2 usec. This is actually pretty costly since the microcode consists largely of subroutine calls to short routines. Code density was more important than performance.

PC contains a 16b value that can be treated as a 16b register, or four 4b registers named PC4 (ms), PC3, PC2, and PC1 (ls). This register also supplies the memory address when an instruction contains a memory access operation. The address is a nibble address, which is what limits the architecture to accessing at most 32 KB of RAM.

AUX[16] is a file of sixteen 16b registers. These are used for holding and supplying 16b values to the PC. They are required because saving/restoring the PC value to memory takes many microinstructions. When a value is transferred from the PC to an AUX register, the value can be sent directly, or it can be adjusted by +1, +2, -1, or -2. This makes advancing a pointer through memory efficient.

F[8] is a file of eight 4b values. These are used as a scratch pad for holding the results of nibble calculations from the 4b ALU.

C is an 8b register that holds two nibbles from the RAM. Although the machine is nibble wide, memory reads read two nibbles at a time. Interestingly, there are two addressing modes: horizontal and vertical. Roughly, horizontal mode causes RAM nibbles MEM[PC] and MEM[PC xor 0x0001] to be fetched, while vertical mode causes RAM nibbles MEM[PC] and MEM[PC xor 0x0010] to be fetched. Horizontal mode is useful for fetching a sequence of bytes, while vertical mode is useful for fetching corresponding nibbles from two floating point numbers that are 8 bytes apart in memory. Once fetched, the contents of C are used 4b at a time as either CH (high) or CL (low) nibbles. There is another mode bit whereby memory reads access a 2KB ROM instead of RAM. This ROM contains fixed constants, error messages, and tables of keywords. This other ROM is required because, other than immediate constants, the microarchitecture has no way of reading data from the microstore.

K is another 8b register. It is used to send 8b values over the I/O bus or to capture 8b values read from the I/O bus. Microinstructions refer to its two parts as KH (high) and KL (low).

Finally, there are four 4b registers. ST1 and ST3 are collections of ad hoc status/control bits that do things like set the memory read vertical/horizontal mode and detect when I/O operations have completed. ST2 and ST4 are just simple 4b read/write registers that software uses to help guide its operation without having to access main memory.

You can see a very simple block diagram of the file, or a more detailed block diagram.

Microcode Format 1

There are a few different formats for microcode instructions. The Wang 2200 service manual contains a very helpful table of microword encodings, although the table is not 100% correct!

The first format performs 4b ALU operations.

Table 2: Microcode Format 1 -- Register Instructions
ROM Instruction Bits R19-R15 R14 R13-R10 R9-R8 R7-R4 R3-R0
Instruction Designators Op Code X B Bus Source Memory A Bus Source C Bus Dest.

Table 3: Microcode Format 1 -- Opcode Instruction Types
Function Mnemonic if A specifies a source Mnemonic if A is 4b immediate
bitwise OR OR ORI
bitwise XOR XOR XORI
bitwise AND AND ANDI
decimal subtract w/carry DSC ----
binary add A AI
binary add w/carry AC ACI
decimal add DA DAI
decimal add w/carry DAC DACI

All operations have the form:

(dest C) = (source A) <op> (source B).

Most operations can specify that the A source specifier either selects a source or can be treated as a simple 4b immediate value. All of these have a two bit MemOp field that indicate what type of memory operation to perform. See Table 13.

Microcode Format 2

Table 4: Microcode Format 2 -- Unconditional Branch Instructions
ROM Instruction Bits R19-R16 R15-R12 R11-R8 R7-R4 R3-R0
Instruction Designators Op Code Imm
Table 5: Microcode Format 2 -- Unconditional Branch Types
Function Mnemonic Branch address
subroutine branch SB 16b absolute branch
unconditional branch B 16b absolute branch

Format 2 is used for Unconditional Branch and Subroutine Branch. The SB and B instructions jump to an arbitrary microinstruction anywhere. The SB instruction also pushes the IC of the calling instruction onto the ICSTACK and decrements the ICSTACK pointer afterward. There is nothing to prevent the ICSTACK from overfilling and wrapping around (in fact, it does get used this way).

Microcode Format 3

Format 3 encodes the other branch instructions.

Table 6: Microcode Format 3 -- Conditional Branch Instructions
ROM Instruction Bits R19-R16 R15-R12 R11-R8 R7-R4 R3-R0
Instruction Designators Op Code src B Imm src A Imm
Table 7: Microcode Format 3 -- Conditional Branch Types
Function Mnemonic Branch address
branch if (src A) = (src B) BER 8b branch within page
branch if (src A) != (src B) BNR 8b branch within page
branch if (imm A & src B) = (imm A)
(branch if TRUE)
BT 8b branch within page
branch if (~imm A & src B) = (~imm A)
(branch if FALSE)
BF 8b branch within page
branch if (imm A) = (src B) BEQ 8b branch within page
branch if (imm A) != (src B) BNE 8b branch within page

Format 3 encodes the other branch instructions, and requires some explanation. For all of these instructions, if the branch condition isn't met, IC increments to the next microinstruction. If the branch is to be taken, the top eight bits of IC are kept and the bottom eight bits of IC are replaced with an immediate constant. Because of this, the microstore can be viewed as 256 "pages" of 256 instructions per page. SB and B can jump within a page or to any other page, while the other branches can only branch within page.

BER and BNR simply compare two operands and branch if they are equal or not. BEQ and BNE are the same, except (src A) is treated as a four bit immediate value. BT means that the "src A" field is interpreted as a 4b constant, and a branch is made if the chosen B operand has a 1 bit at least in every bit position where the immediate A operand has a 1. With this it is possible to test if a given bit is set, or a given collection of bits are set. Likewise, BF make sure that for every 1 bit in the immediate A operand, the corresponding bit of the B operand is zero.

"Mini Instruction" Microcode Formats

The one "missing" Format 1 operation, what would have been decimal subtract with carry immediate, is instead used to implement a collection of "Mini Instructions."

Table 8: Microcode Format 4 -- Mini Instructions
ROM Instruction Bits R19-R10 R9-R8 R7-R4 R3-R0
Instruction Designators OpCode MemOp src A R
Table 9: Mini Instructions
Mnemonic Operation
CIO I/O bus control
SR Subroutine Return
TPI Transfer PC to IC
TIP Transfer IC to PC
TMP Transfer Memory size to PC
TA Transfer Aux[R] to PC
TP Transfer PC to Aux[R]
TP+1 Transfer PC to Aux[R]; then add 1 to Aux[R]
TP-1 Transfer PC to Aux[R]; then sub 1 from Aux[R]
TP+2 Transfer PC to Aux[R]; then add 2 to Aux[R]
TP-2 Transfer PC to Aux[R]; then sub 2 from Aux[R]
XP eXchange PC with Aux[R]
XP+1 eXchange PC with Aux[R]; then add 1 to Aux[R]
XP-1 eXchange PC with Aux[R]; then sub 1 from Aux[R]
XP+2 eXchange PC with Aux[R]; then add 2 to Aux[R]
XP-2 eXchange PC with Aux[R]; then sub 2 from Aux[R]

In these instructions, the R field picks one of sixteen Aux register to operate on; CIO, SR, TPI, TIP, and TMP just ignore this field.

All of these instructions have a two bit MemOp field that indicate what type of memory operation to perform (see Table 13). The memory address used is the value of the PC register at the start of the instruction. For write operations, the write data comes from the register specified by the "src A" field.

TMP is used exactly once in the 20K word microprogram. During initialization, this instruction returns the number of nibbles of memory in the system. This value isn't determined through any hardware magic; it is simply a dip switch setting on one of the CPU cards.

CIO is used to send one of three types of control strobes to the I/O bus. "-ABS" causes the contents of the K register to be clocked into an I/O address register and then the active low "Address Bus Strobe" signal is fired for about 5 microseconds. This causes the various I/O cards to select or deselect. Another type of strobe is "-OBS", or "Output Bus (Data) Strobe", which causes the value of the K register to be driven on the I/O bus followed by a ~5 microsecond data strobe signal. Finally, "-CBS" is a control bus strobe; most cards ignore this signal, but some use it like a secondary -OBS. Although there are these timing signals generated by one-shots on the I/O controller card, there are no hardware interlocks to prevent the microcode from initiating an ABS/OBS/CBS strobe before the previous sequence has finished. Instead, after any CIO instruction, the microcode invariably calls a subroutine that kills about 11 microseconds before returning.

Microcode Field Encoding

Next comes the definition of what operands are available to the A and B ALU sources and how the C dest is specified.

Table 10: ALU "A" source encoding
A Field Encoding Mini Instructions All Other Instructions
0000 to 0111 F[n] F[n] one of eight scratch registers
1000 CH CH
1001 illegal CH- CH; decrement PC1 mod 16
1010 illegal CH+ CH; increment PC1 mod 16
1011 illegal - imm. 0; decrement PC1 mod 16
1100 CL CL
1101 illegal CL- CL; decrement PC1 mod 16
1110 illegal CL+ CL; increment PC1 mod 16
1111 illegal + imm. 0; increment PC1 mod 16

For cases where PC1 is incremented or decremented and there is a memory operation, the update effectively happens after the original PC has been used to address memory.

Table 11: ALU "B" source encoding
B Field encoding X Bit (R14) = 0 X Bit (R14) = 1
0000 to 0111 F[n] F[n]
1000 KH I/O Register Status Register 3
1001 KL I/O Register Status Register 4
1010 Status Register 1 PC2
1011 Status Register 2 PC3
1100 PC1 PC4
1101 CH Data Register CH Data Register
1110 CL Data Register CL Data Register
1111 Immediate 0 Immediate 0

Table 12: ALU "C" destination encoding
C Field encoding X Bit (R14)=0 X Bit (R14)=1
0000 to 0111 F[n] F[n]
1000 KH I/O Register Status Register 3
1001 KL I/O Register Status Register 4
1010 Status Register 1 PC2
1011 Status Register 2 PC3
1100 PC1 PC4
1101 illegal illegal
1110 illegal illegal
1111 ignore results ignore results

Notice that both the B and C fields might be affected by the "X" bit. This means that certain pairs of source/destination aren't possible. For example, microcode can specify ADD F0,PC2,PC3 (add F[0] to PC2 and store the result in PC3), but ADD F0,PC1,PC3 is not possible since the X bit must be 0 to select PC1 for the B input but the X bit must be 1 to select PC3 for the destination.

Table 13: M Field encoding (Memory operation)
M Field encoding Meaning
00 no memory operation
01 read 8b of RAM or ROM from the address indicated by PC into the {CH,CL} register
10 write ALU result to RAM at the nibble address indicated by PC
11 write ALU result to RAM at the nibble address indicated by PC ^ (vertical_mode ? 0x10 : 0x01)

For a read operation, the currently selected vertical/horizontal addressing mode bit and the RAM/ROM selection bit (both controlled by STx register bits) are used with the current PC value to read two nibble of data into the { CH, CL } registers. On a write, the vertical/horizontal addressing mode bit combines with the write 1/write 2 mode control and the PC to write one nibble of data; this nibble is selected by the src A operand for the mini-op instructions, otherwise it comes from the ALU output.

Microarchitecture Example Code (link)

The above description gives many details, but they are best understood by looking at real code to see how they work together. The exposition will use real code from the ROM. All comments are mine, made simply through observation of the code; thus, there is a danger that my commentary is inaccurate. The syntax is similar, but not identical, to the original syntax of the Wang microcode assembler.

In the argument field, operands always appear in the order A,B,C. If the source of A or B is the constant zero, it can be specified by a null field. If the ALU result isn't to be stored back into a register, the C field is specified as a null field. For example, ADD,W1 F3,, means that argB is implicitly 0 and argC is the null destination; this instruction is adding 0 to the scratch register F3 which appears on the C result bus; the result isn't saved to a register, but it is the data that is used for the write operation specified by the ",W1" modifier to the opcode.

uCode Example #1: Skip over spaces
IC ucode Mnemonic Behavior
02A1 58C04 TA     4 transfer the contents of AUX[4] to the PC register, wiping out the previous contents of PC
02A2 5A904 TP+2,R 4 transfer PC+2 back to AUX[4]; read the byte at RAM[PC], storing it in C. We increment by two because PC is a nibble address, and we are advancing to the next byte.
02A3 FEA25 BNE    2,CL,02A5 jump to return if low nibble isn't 2
02A4 EDA01 BEQ    0,CH,02A1 loop back if high nibble is 0; (note the nibble swap: this is seeking HEX(20), which is space)
02A5 58400 SR return to caller

uCode fragment #1 scans a line of code, skipping ahead until a non-space is found. AUX[4] contains the 16b pointer to the current byte being scanned, and returns with C containing the first non-space and AUX[4] pointing to the byte after it. Undoubtedly in the original source code the constant "4" would have been represented by a symbolic name.

uCode Example #2: Code fragment to fill memory with a constant
IC ucode Mnemonic Behavior
03B9 528EA ANDI  0E,ST1,ST1 clear bit 0 of ST1; this is the carry bit
03BA 680E0 ACI   0E,F0,F0 subtract two from the 16b quantity stored in {F3,F2,F1,F0}
03BB 684F1 ACI   0F,F1,F1
03BC 688F2 ACI   0F,F2,F2
03BD 68CF3 ACI   0F,F3,F3
03BE DAC14 BF    1,ST1,03C4 test bit 1 of ST1 (carry); if there is no carry, we are done
03BF 5BC01 XP-2 this and the next instruction simply decrement PC by 2 using AUX[1] as a temporary register
03C0 59001 XP    1
03C1 6160F AI,W1 0,F5, store {F4,F5} in memory at the byte pointed at by PC
03C2 6130F AI,W2 0,F4,
03C3 B0B39 B     03B9 loop back to the start of the routine
03C4 58400 SR return from subroutine

This routine uses {F3,F2,F1,F0} as a 16b count of the number of nibbles to fill with a constant byte. The byte is supplied by {F5,F4}. Note that it takes two write operations to fill a byte. The operations at 03C1 and 03C2 write the two nibbles, encoded as ",W1" and ",W2", represent the nibbles at MEM[PC] and MEM[PC^0x0001], that is, an adjacent nibble pair. The operation for those two instructions, "AI" is "add immediate." Because there is no third argument (note the final comma), the result isn't saved anywhere other than to memory.

The "XP-2 AUX1", "XP AUX1" pair is a fairly common idiom. Although AUX[1] is involved, its value is unaffected. This is done simply for the side effect of adjusting the PC value.

Although the memory interface allows writing one nibble per instruction (at sixteen clocks per instruction), this routine only writes two nibbles per eleven instructions, for 18% efficiency.

Here is one last example, presented as a simple listing with my comments. When the END command is executed, BASIC prints out the number of free bytes left. This is calculated in binary, but must be presented in ASCII. This code converts a 16b binary number in PC to a 20b BCD number in {R3,R2,R1,R0,R6}, then calls a subroutine to print out these BCD numbers as ASCII. You can see that the instruction set isn't very efficient.

The calculated amount is in terms of nibbles, but the printed result is in terms of bytes. That is why the main loop is iterated 15 times, not 16, to effect a divide by 2.

uCode Example #3: Code fragment to print free memory
IC ucode Mnemonic Behavior
10C3 43CD0 ORI   0D,,F0
10C4 43CEC ORI   0E,,PC1
10C5 47C2A ORI   2,,PC2
10C6 A1123 SB    1213
10C7 43CB0 ORI   0B,,F0
10C8 47C7A ORI   7,,PC2
10C9 43C0C ORI   0,,PC1
10CA A1123 SB    1213
10CB 58C07 TA    7
10CC A0538 SB    0358 {R3,R2,R1,R0}=PC
10CD 528EA ANDI  0E,ST1,ST1
10CE 68832 ACI   3,F2,F2
10CF 68C03 ACI   0,F3,F3
10D0 DAD13 BF    1,ST1,10D3
10D1 528EA ANDI  0E,ST1,ST1
10D2 B1D05 B     10D5
10D3 58C05 TA    5
10D4 A014F SB    041F PC = PC - {R3,R2,R1,R0}
10D5 43C00 ORI   0,,F0
10D6 43C01 ORI   0,,F1
10D7 43C02 ORI   0,,F2
10D8 43C03 ORI   0,,F3
10D9 43C06 ORI   0,,F6 {F3,F2,F1,F0,F6} is a 5 bcd digit number
10DA DAE1B BF    1,ST1,10EB
10DB 43CF7 ORI   0F,,F7 step 15 times
10DC 528EA ANDI  0E,ST1,ST1 clear carry
10DD 39866 DAC   F6,F6,F6 double bcd number
10DE 38000 DAC   F0,F0,F0
10DF 38411 DAC   F1,F1,F1
10E0 38822 DAC   F2,F2,F2
10E1 38C33 DAC   F3,F3,F3
10E2 A0647 SB    0467 PC = PC*2; st1.1 is msb of old PC
10E3 DAE19 BF    1,ST1,10E9 if no carry, skip next step
10E4 79806 DACI  0,F6,F6 bcd increment by 1 (I'm not sure why we couldn't just loop back to 10DD and let any carry trickle in that way)
10E5 78000 DACI  0,F0,F0
10E6 78401 DACI  0,F1,F1
10E7 78802 DACI  0,F2,F2
10E8 78C03 DACI  0,F3,F3
10E9 61CF7 AI    0F,F7,F7
10EA F7D0D BNE   0,F7,10DD
10EC 03C69 OR    F6,,KL get ls digit
10ED A00B2 SB    0B02 output that 5th digit
10EE A072E SB    027E output 0x0D
10EF A072E SB    027E output 0x0D
10F0 DBF82 BF    8,ST2,10F2
10F1 A0F13 SB    01F3
10F2 B020A B     002A