Pre-history (link)
Before getting into the 2200 machines, a very short run up to what preceded it is in order. To get a much more in-depth understanding, read Rick Bensene's excellent article about Wang's early systems.
By the mid 1960's, Wang started making a name for itself in the electronic calculator business. The LOCI-2 machine lead the way, with the 300 series and the 700 series being the blockbuster families. Wang also had 100, 400, 500, 600 series machines that appear to have been mainly derivations from the 300 and 700.
The 700 series itself had an interesting background. Originally Wang was intended to produce a "real" CPU, but when HP came out with their still amazing HP 9100 calculator, Wang recognized that the 300 family was doomed and abruptly redirected the CPU team to produce the 700 series programmable calculators.
The next attempt at a CPU was internally called the 800 CPU, and some of the old documents reference it, but it ultimately became known as the 2200 CPU. In some ways, the microarchitecture of the 2200 CPU was very similar to the microarchitecture of the 700 CPU. Specifically, the machine had a 4b ALU that was fed from an A bus and a B bus; a read from memory received 8b of data in the C register; a write to memory sent only 4b of data; the I/O bus was 8b wide, composed of the high and low 4b nibbles of the K register. There were also differences: 16b microinstruction address vs. 12b; eight nibble register file instead of three; 16 word auxiliary register stack vs none; tight 20b vertically encoded microword vs 43b horizontally encoded microword; 1.6 uS cycle time vs 1.25 uS (yes, the 2200 was slower).
Origin of the 2200 (link)
The 2200 project was started in 1970, with Bob Kolk as the project lead. Some of the other team members were Bruce Patterson, Dave Angel, Joe Wang, and Horace Tsaing. Bruce Patterson did much of the microcode; Dave Angel coded up the $GIO and $TRAN instructions; presumably the other members designed the hardware. The 2200 was first available for sale in May, 1973.
The 2200 was designed just before the era of integrated microprocessors, such as the 8080 and 6502. Since the designers didn't have such useful building blocks, they had to design their own CPU out of hundreds of elementary 14 and 16 pin chips (mostly 7400-series TTL parts). At the heart of the CPU was a single 4b 74181 ALU bit slice. Some people have made a big deal of this, calling it "BASIC executed in hardware," but really the CPU is just a instruction set processor implemented in TTL and the 42.5 KB of microcode is akin to the BASIC interpreter of a conventional microcomputer. Another reason for viewing this as a CPU and not a micromachine is that there is no overlap of instruction execution and no hazards from one microinstruction to the next (jargon free: all side effects from microinstruction N take effect before microinstruction N+1 begins execution).
Although the ALU is only 4b wide, the power of this system is roughly equivalent to a 2 MHz 8080 CPU due to the fact that microinstructions often do more than one thing. The 1.6 usec (16 cycles at 10 MHz) cycle time of the 2200 is roughly the same as the 1.5 - 2 usec (3 or 4 cycles at 2 MHz) memory access of an 8080.
2200 Microarchitecture (link)
The following information is intended to give the flavor of the microarchitecture, but doesn't cover everything. Most of this information was derived from the 2200 Systems Maintenance Manual, although it is vague enough that some of this was figured out by reading between the lines and studying the actual 2200T microcode.
The view of the CPU presented to the microprogrammer is as follows.
Register name [array] | Width | Function |
---|---|---|
IC | 16b | microcode instruction counter |
ICSTACK[16] | 16b | microcode return stack |
PC (PC4, PC3, PC2, PC1) | 16b (4b, 4b, 4b, 4b) | memory address pointer; scratch register |
AUX[16] | 16b | auxiliary PC file |
F[8] | 4b | nibble data registers |
C (CH, CL) | 8b (4b,4b) | memory read data |
K (KH, KL) | 8b (4b,4b) | 8b data to/from the I/O bus |
ST1 | 4b | status register 1 |
ST2 | 4b | status register 2 |
ST3 | 4b | status register 3 |
ST4 | 4b | status register 4 |
IC points at the current microinstruction being executed. Each microinstruction is 20b wide. Every microinstruction takes sixteen 10 MHz clock cycles, or 1.6 usec per microinstruction. The IC can be loaded with a 16b immediate value (i.e., JUMP or CALL); its value can be saved on the next location in the ICSTACK or its value restored from the same; its value can be transferred to/from the PC.
ICSTACK holds return addresses from the microcode subroutine calls. The stack is 16 deep; if the call nesting gets deeper than 16 levels, the ICSTACK pointer just wraps around and overwrites the oldest entry. In order to keep the implementation simple, a subroutine call pushes the current IC on the ICSTACK, and the corresponding subroutine return pops that same value back into the IC. In neither the push nor the pop does the IC address get incremented. The problem with this is that the subroutine call would just be executed again after the subroutine return. However, the subroutine return instruction sets a flag which causes the next microinstruction (which invariably is the subroutine call) to be ignored. This means that effectively a return from subroutine instruction takes two microcycles, or 3.2 usec. This is actually pretty costly since the microcode consists largely of subroutine calls to short routines. Code density was more important than performance.
PC contains a 16b value that can be treated as a 16b register, or four 4b registers named PC4 (ms), PC3, PC2, and PC1 (ls). This register also supplies the memory address when an instruction contains a memory access operation. The address is a nibble address, which is what limits the architecture to accessing at most 32 KB of RAM.
AUX[16] is a file of sixteen 16b registers. These are used for holding and supplying 16b values to the PC. They are required because saving/restoring the PC value to memory takes many microinstructions. When a value is transferred from the PC to an AUX register, the value can be sent directly, or it can be adjusted by +1, +2, -1, or -2. This makes advancing a pointer through memory efficient.
F[8] is a file of eight 4b values. These are used as a scratch pad for holding the results of nibble calculations from the 4b ALU.
C is an 8b register that holds two nibbles from the RAM. Although the machine is nibble wide, memory reads read two nibbles at a time. Interestingly, there are two addressing modes: horizontal and vertical. Roughly, horizontal mode causes RAM nibbles MEM[PC] and MEM[PC xor 0x0001] to be fetched, while vertical mode causes RAM nibbles MEM[PC] and MEM[PC xor 0x0010] to be fetched. Horizontal mode is useful for fetching a sequence of bytes, while vertical mode is useful for fetching corresponding nibbles from two floating point numbers that are 8 bytes apart in memory. Once fetched, the contents of C are used 4b at a time as either CH (high) or CL (low) nibbles. There is another mode bit whereby memory reads access a 2KB ROM instead of RAM. This ROM contains fixed constants, error messages, and tables of keywords. This other ROM is required because, other than immediate constants, the microarchitecture has no way of reading data from the microstore.
K is another 8b register. It is used to send 8b values over the I/O bus or to capture 8b values read from the I/O bus. Microinstructions refer to its two parts as KH (high) and KL (low).
Finally, there are four 4b registers. ST1 and ST3 are collections of ad hoc status/control bits that do things like set the memory read vertical/horizontal mode and detect when I/O operations have completed. ST2 and ST4 are just simple 4b read/write registers that software uses to help guide its operation without having to access main memory.
You can see a very simple block diagram of the file, or a more detailed block diagram.
Microcode Format 1
There are a few different formats for microcode instructions. The Wang 2200 service manual contains a very helpful table of microword encodings, although the table is not 100% correct!
The first format performs 4b ALU operations.
ROM Instruction Bits | R19-R15 | R14 | R13-R10 | R9-R8 | R7-R4 | R3-R0 |
---|---|---|---|---|---|---|
Instruction Designators | Op Code | X | B Bus Source | Memory | A Bus Source | C Bus Dest. |
Function | Mnemonic if A specifies a source | Mnemonic if A is 4b immediate |
---|---|---|
bitwise OR | OR | ORI |
bitwise XOR | XOR | XORI |
bitwise AND | AND | ANDI |
decimal subtract w/carry | DSC | ---- |
binary add | A | AI |
binary add w/carry | AC | ACI |
decimal add | DA | DAI |
decimal add w/carry | DAC | DACI |
All operations have the form:
(dest C) = (source A) <op> (source B).
Most operations can specify that the A source specifier either selects a source or can be treated as a simple 4b immediate value. All of these have a two bit MemOp field that indicate what type of memory operation to perform. See Table 13.
Microcode Format 2
ROM Instruction Bits | R19-R16 | R15-R12 | R11-R8 | R7-R4 | R3-R0 |
---|---|---|---|---|---|
Instruction Designators | Op Code | Imm |
Function | Mnemonic | Branch address |
---|---|---|
subroutine branch | SB | 16b absolute branch |
unconditional branch | B | 16b absolute branch |
Format 2 is used for Unconditional Branch and Subroutine Branch. The SB and B instructions jump to an arbitrary microinstruction anywhere. The SB instruction also pushes the IC of the calling instruction onto the ICSTACK and decrements the ICSTACK pointer afterward. There is nothing to prevent the ICSTACK from overfilling and wrapping around (in fact, it does get used this way).
Microcode Format 3
Format 3 encodes the other branch instructions.
ROM Instruction Bits | R19-R16 | R15-R12 | R11-R8 | R7-R4 | R3-R0 |
---|---|---|---|---|---|
Instruction Designators | Op Code | src B | Imm | src A | Imm |
Function | Mnemonic | Branch address |
---|---|---|
branch if (src A) = (src B) | BER | 8b branch within page |
branch if (src A) != (src B) | BNR | 8b branch within page |
branch if (imm A & src B) = (imm A)
(branch if TRUE) |
BT | 8b branch within page |
branch if (~imm A & src B) = (~imm A) (branch if FALSE) |
BF | 8b branch within page |
branch if (imm A) = (src B) | BEQ | 8b branch within page |
branch if (imm A) != (src B) | BNE | 8b branch within page |
Format 3 encodes the other branch instructions, and requires some explanation. For all of these instructions, if the branch condition isn't met, IC increments to the next microinstruction. If the branch is to be taken, the top eight bits of IC are kept and the bottom eight bits of IC are replaced with an immediate constant. Because of this, the microstore can be viewed as 256 "pages" of 256 instructions per page. SB and B can jump within a page or to any other page, while the other branches can only branch within page.
BER and BNR simply compare two operands and branch if they are equal or not. BEQ and BNE are the same, except (src A) is treated as a four bit immediate value. BT means that the "src A" field is interpreted as a 4b constant, and a branch is made if the chosen B operand has a 1 bit at least in every bit position where the immediate A operand has a 1. With this it is possible to test if a given bit is set, or a given collection of bits are set. Likewise, BF make sure that for every 1 bit in the immediate A operand, the corresponding bit of the B operand is zero.
"Mini Instruction" Microcode Formats
The one "missing" Format 1 operation, what would have been decimal subtract with carry immediate, is instead used to implement a collection of "Mini Instructions."
ROM Instruction Bits | R19-R10 | R9-R8 | R7-R4 | R3-R0 |
---|---|---|---|---|
Instruction Designators | OpCode | MemOp | src A | R |
Mnemonic | Operation |
---|---|
CIO | I/O bus control |
SR | Subroutine Return |
TPI | Transfer PC to IC |
TIP | Transfer IC to PC |
TMP | Transfer Memory size to PC |
TA | Transfer Aux[R] to PC |
TP | Transfer PC to Aux[R] |
TP+1 | Transfer PC to Aux[R]; then add 1 to Aux[R] |
TP-1 | Transfer PC to Aux[R]; then sub 1 from Aux[R] |
TP+2 | Transfer PC to Aux[R]; then add 2 to Aux[R] |
TP-2 | Transfer PC to Aux[R]; then sub 2 from Aux[R] |
XP | eXchange PC with Aux[R] |
XP+1 | eXchange PC with Aux[R]; then add 1 to Aux[R] |
XP-1 | eXchange PC with Aux[R]; then sub 1 from Aux[R] |
XP+2 | eXchange PC with Aux[R]; then add 2 to Aux[R] |
XP-2 | eXchange PC with Aux[R]; then sub 2 from Aux[R] |
In these instructions, the R field picks one of sixteen Aux register to operate on; CIO, SR, TPI, TIP, and TMP just ignore this field.
All of these instructions have a two bit MemOp field that indicate what type of memory operation to perform (see Table 13). The memory address used is the value of the PC register at the start of the instruction. For write operations, the write data comes from the register specified by the "src A" field.
TMP is used exactly once in the 20K word microprogram. During initialization, this instruction returns the number of nibbles of memory in the system. This value isn't determined through any hardware magic; it is simply a dip switch setting on one of the CPU cards.
CIO is used to send one of three types of control strobes to the I/O bus. "-ABS" causes the contents of the K register to be clocked into an I/O address register and then the active low "Address Bus Strobe" signal is fired for about 5 microseconds. This causes the various I/O cards to select or deselect. Another type of strobe is "-OBS", or "Output Bus (Data) Strobe", which causes the value of the K register to be driven on the I/O bus followed by a ~5 microsecond data strobe signal. Finally, "-CBS" is a control bus strobe; most cards ignore this signal, but some use it like a secondary -OBS. Although there are these timing signals generated by one-shots on the I/O controller card, there are no hardware interlocks to prevent the microcode from initiating an ABS/OBS/CBS strobe before the previous sequence has finished. Instead, after any CIO instruction, the microcode invariably calls a subroutine that kills about 11 microseconds before returning.
Microcode Field Encoding
Next comes the definition of what operands are available to the A and B ALU sources and how the C dest is specified.
A Field Encoding | Mini Instructions | All Other Instructions | |
---|---|---|---|
0000 to 0111 | F[n] | F[n] | one of eight scratch registers |
1000 | CH | CH | |
1001 | illegal | CH- | CH; decrement PC1 mod 16 |
1010 | illegal | CH+ | CH; increment PC1 mod 16 |
1011 | illegal | - | imm. 0; decrement PC1 mod 16 |
1100 | CL | CL | |
1101 | illegal | CL- | CL; decrement PC1 mod 16 |
1110 | illegal | CL+ | CL; increment PC1 mod 16 |
1111 | illegal | + | imm. 0; increment PC1 mod 16 |
For cases where PC1 is incremented or decremented and there is a memory operation, the update effectively happens after the original PC has been used to address memory.
B Field encoding | X Bit (R14) = 0 | X Bit (R14) = 1 |
---|---|---|
0000 to 0111 | F[n] | F[n] |
1000 | KH I/O Register | Status Register 3 |
1001 | KL I/O Register | Status Register 4 |
1010 | Status Register 1 | PC2 |
1011 | Status Register 2 | PC3 |
1100 | PC1 | PC4 |
1101 | CH Data Register | CH Data Register |
1110 | CL Data Register | CL Data Register |
1111 | Immediate 0 | Immediate 0 |
C Field encoding | X Bit (R14)=0 | X Bit (R14)=1 |
---|---|---|
0000 to 0111 | F[n] | F[n] |
1000 | KH I/O Register | Status Register 3 |
1001 | KL I/O Register | Status Register 4 |
1010 | Status Register 1 | PC2 |
1011 | Status Register 2 | PC3 |
1100 | PC1 | PC4 |
1101 | illegal | illegal |
1110 | illegal | illegal |
1111 | ignore results | ignore results |
Notice that both the B and C fields might be affected by the "X" bit.
This means that certain pairs of source/destination aren't possible.
For example, microcode can specify ADD F0,PC2,PC3
(add F[0] to PC2 and store the result in PC3), but ADD F0,PC1,PC3
is not possible since the X bit must be 0 to select PC1 for the B input
but the X bit must be 1 to select PC3 for the destination.
M Field encoding | Meaning |
---|---|
00 | no memory operation |
01 | read 8b of RAM or ROM from the address indicated by PC into the {CH,CL} register |
10 | write ALU result to RAM at the nibble address indicated by PC |
11 | write ALU result to RAM at the nibble address indicated by PC ^ (vertical_mode ? 0x10 : 0x01) |
For a read operation, the currently selected vertical/horizontal addressing mode bit and the RAM/ROM selection bit (both controlled by STx register bits) are used with the current PC value to read two nibble of data into the { CH, CL } registers. On a write, the vertical/horizontal addressing mode bit combines with the write 1/write 2 mode control and the PC to write one nibble of data; this nibble is selected by the src A operand for the mini-op instructions, otherwise it comes from the ALU output.
Microarchitecture Example Code (link)
The above description gives many details, but they are best understood by looking at real code to see how they work together. The exposition will use real code from the ROM. All comments are mine, made simply through observation of the code; thus, there is a danger that my commentary is inaccurate. The syntax is similar, but not identical, to the original syntax of the Wang microcode assembler.
In the argument field, operands always appear in the order A,B,C.
If the source of A or B is the constant zero, it can be specified by a
null field. If the ALU result isn't to be stored back into a register, the
C field is specified as a null field. For example, ADD,W1 F3,,
means that argB is implicitly 0 and argC is the null destination;
this instruction is adding 0 to the scratch register F3 which appears on
the C result bus; the result isn't saved to a register, but it is the data
that is used for the write operation specified by the ",W1" modifier to
the opcode.
IC | ucode | Mnemonic | Behavior |
---|---|---|---|
02A1 | 58C04 | TA 4 | transfer the contents of AUX[4] to the PC register, wiping out the previous contents of PC |
02A2 | 5A904 | TP+2,R 4 | transfer PC+2 back to AUX[4]; read the byte at RAM[PC], storing it in C. We increment by two because PC is a nibble address, and we are advancing to the next byte. |
02A3 | FEA25 | BNE 2,CL,02A5 | jump to return if low nibble isn't 2 |
02A4 | EDA01 | BEQ 0,CH,02A1 | loop back if high nibble is 0; (note the nibble swap: this is seeking HEX(20), which is space) |
02A5 | 58400 | SR | return to caller |
uCode fragment #1 scans a line of code, skipping ahead until a non-space is found. AUX[4] contains the 16b pointer to the current byte being scanned, and returns with C containing the first non-space and AUX[4] pointing to the byte after it. Undoubtedly in the original source code the constant "4" would have been represented by a symbolic name.
IC | ucode | Mnemonic | Behavior |
---|---|---|---|
03B9 | 528EA | ANDI 0E,ST1,ST1 | clear bit 0 of ST1; this is the carry bit |
03BA | 680E0 | ACI 0E,F0,F0 | subtract two from the 16b quantity stored in {F3,F2,F1,F0} |
03BB | 684F1 | ACI 0F,F1,F1 | |
03BC | 688F2 | ACI 0F,F2,F2 | |
03BD | 68CF3 | ACI 0F,F3,F3 | |
03BE | DAC14 | BF 1,ST1,03C4 | test bit 1 of ST1 (carry); if there is no carry, we are done |
03BF | 5BC01 | XP-2 | this and the next instruction simply decrement PC by 2 using AUX[1] as a temporary register |
03C0 | 59001 | XP 1 | |
03C1 | 6160F | AI,W1 0,F5, | store {F4,F5} in memory at the byte pointed at by PC |
03C2 | 6130F | AI,W2 0,F4, | |
03C3 | B0B39 | B 03B9 | loop back to the start of the routine |
03C4 | 58400 | SR | return from subroutine |
This routine uses {F3,F2,F1,F0} as a 16b count of the number of nibbles to fill with a constant byte. The byte is supplied by {F5,F4}. Note that it takes two write operations to fill a byte. The operations at 03C1 and 03C2 write the two nibbles, encoded as ",W1" and ",W2", represent the nibbles at MEM[PC] and MEM[PC^0x0001], that is, an adjacent nibble pair. The operation for those two instructions, "AI" is "add immediate." Because there is no third argument (note the final comma), the result isn't saved anywhere other than to memory.
The "XP-2 AUX1"
, "XP AUX1"
pair is a fairly common idiom. Although AUX[1]
is involved, its value is unaffected. This is done simply for the side
effect of adjusting the PC value.
Although the memory interface allows writing one nibble per instruction (at sixteen clocks per instruction), this routine only writes two nibbles per eleven instructions, for 18% efficiency.
Here is one last example, presented as a simple listing with my comments. When the END command is executed, BASIC prints out the number of free bytes left. This is calculated in binary, but must be presented in ASCII. This code converts a 16b binary number in PC to a 20b BCD number in {R3,R2,R1,R0,R6}, then calls a subroutine to print out these BCD numbers as ASCII. You can see that the instruction set isn't very efficient.
The calculated amount is in terms of nibbles, but the printed result is in terms of bytes. That is why the main loop is iterated 15 times, not 16, to effect a divide by 2.
IC | ucode | Mnemonic | Behavior |
---|---|---|---|
10C3 | 43CD0 | ORI 0D,,F0 | |
10C4 | 43CEC | ORI 0E,,PC1 | |
10C5 | 47C2A | ORI 2,,PC2 | |
10C6 | A1123 | SB 1213 | |
10C7 | 43CB0 | ORI 0B,,F0 | |
10C8 | 47C7A | ORI 7,,PC2 | |
10C9 | 43C0C | ORI 0,,PC1 | |
10CA | A1123 | SB 1213 | |
10CB | 58C07 | TA 7 | |
10CC | A0538 | SB 0358 | {R3,R2,R1,R0}=PC |
10CD | 528EA | ANDI 0E,ST1,ST1 | |
10CE | 68832 | ACI 3,F2,F2 | |
10CF | 68C03 | ACI 0,F3,F3 | |
10D0 | DAD13 | BF 1,ST1,10D3 | |
10D1 | 528EA | ANDI 0E,ST1,ST1 | |
10D2 | B1D05 | B 10D5 | |
10D3 | 58C05 | TA 5 | |
10D4 | A014F | SB 041F | PC = PC - {R3,R2,R1,R0} |
10D5 | 43C00 | ORI 0,,F0 | |
10D6 | 43C01 | ORI 0,,F1 | |
10D7 | 43C02 | ORI 0,,F2 | |
10D8 | 43C03 | ORI 0,,F3 | |
10D9 | 43C06 | ORI 0,,F6 | {F3,F2,F1,F0,F6} is a 5 bcd digit number |
10DA | DAE1B | BF 1,ST1,10EB | |
10DB | 43CF7 | ORI 0F,,F7 | step 15 times |
10DC | 528EA | ANDI 0E,ST1,ST1 | clear carry |
10DD | 39866 | DAC F6,F6,F6 | double bcd number |
10DE | 38000 | DAC F0,F0,F0 | |
10DF | 38411 | DAC F1,F1,F1 | |
10E0 | 38822 | DAC F2,F2,F2 | |
10E1 | 38C33 | DAC F3,F3,F3 | |
10E2 | A0647 | SB 0467 | PC = PC*2; st1.1 is msb of old PC |
10E3 | DAE19 | BF 1,ST1,10E9 | if no carry, skip next step |
10E4 | 79806 | DACI 0,F6,F6 | bcd increment by 1 (I'm not sure why we couldn't just loop back to 10DD and let any carry trickle in that way) |
10E5 | 78000 | DACI 0,F0,F0 | |
10E6 | 78401 | DACI 0,F1,F1 | |
10E7 | 78802 | DACI 0,F2,F2 | |
10E8 | 78C03 | DACI 0,F3,F3 | |
10E9 | 61CF7 | AI 0F,F7,F7 | |
10EA | F7D0D | BNE 0,F7,10DD | |
10EC | 03C69 | OR F6,,KL | get ls digit |
10ED | A00B2 | SB 0B02 | output that 5th digit |
10EE | A072E | SB 027E | output 0x0D |
10EF | A072E | SB 027E | output 0x0D |
10F0 | DBF82 | BF 8,ST2,10F2 | |
10F1 | A0F13 | SB 01F3 | |
10F2 | B020A | B 002A |