Wang 2200 Microarchitecture Description

Pre-history (link)

Before getting into the 2200 machines, a very short run up to what preceded it is in order. To get a much more in-depth understanding, read Rick Bensene's excellent article about Wang's early systems.

By the mid 1960's, Wang started making a name for itself in the electronic calculator business. The LOCI-2 machine lead the way, with the 300 series and the 700 series being the blockbuster families. Wang also had 100, 400, 500, 600 series machines that appear to have been mainly derivations from the 300 and 700.

The 700 series itself had an interesting background. Originally Wang was intended to produce a "real" CPU, but when HP came out with their still amazing HP 9100 calculator, Wang recognized that the 300 family was doomed and abruptly redirected the CPU team to produce the 700 series programmable calculators.

The next attempt at a CPU was internally called the 800 CPU, and some of the old documents reference it, but it ultimately became known as the 2200 CPU. In some ways, the microarchitecture of the 2200 CPU was very similar to the microarchitecture of the 700 CPU. Specifically, the machine had a 4b ALU that was fed from an A bus and a B bus; a read from memory received 8b of data in the C register; a write to memory sent only 4b of data; the I/O bus was 8b wide, composed of the high and low 4b nibbles of the K register. There were also differences: 16b microinstruction address vs. 12b; eight nibble register file instead of three; 16 word auxiliary register stack vs none; tight 20b vertically encoded microword vs 43b horizontally encoded microword; 1.6 uS cycle time vs 1.25 uS (yes, the 2200 was slower).

Origin of the 2200 (link)

The 2200 project was started in 1970, with Bob Kolk as the project lead. Some of the other team members were Bruce Patterson, Dave Angel, Joe Wang, and Horace Tsaing. Bruce Patterson did much of the microcode; Dave Angel coded up the $GIO and $TRAN instructions; presumably the other members designed the hardware. The 2200 was first available for sale in May, 1973.

The 2200 was designed just before the era of integrated microprocessors, such as the 8080 and 6502. Since the designers didn't have such useful building blocks, they had to design their own CPU out of hundreds of elementary 14 and 16 pin chips (mostly 7400-series TTL parts). At the heart of the CPU was a single 4b 74181 ALU bit slice. Some people have made a big deal of this, calling it "BASIC executed in hardware," but really the CPU is just a instruction set processor implemented in TTL and the 42.5 KB of microcode is akin to the BASIC interpreter of a conventional microcomputer. Another reason for viewing this as a CPU and not a micromachine is that there is no overlap of instruction execution and no hazards from one microinstruction to the next (jargon free: all side effects from microinstruction N take effect before microinstruction N+1 begins execution).

Although the ALU is only 4b wide, the power of this system is roughly equivalent to a 2 MHz 8080 CPU due to the fact that microinstructions often do more than one thing. The 1.6 usec (16 cycles at 10 MHz) cycle time of the 2200 is roughly the same as the 1.5 - 2 usec (3 or 4 cycles at 2 MHz) memory access of an 8080.

2200 Microarchitecture (link)

The following information is intended to give the flavor of the microarchitecture, but doesn't cover everything. Most of this information was derived from the 2200 Systems Maintenance Manual, although it is vague enough that some of this was figured out by reading between the lines and studying the actual 2200T microcode.

The view of the CPU presented to the microprogrammer is as follows.

Table 1: Wang 2200 CPU Register Resources
Register name [array]	Width	Function
IC	16b	microcode instruction counter
ICSTACK[16]	16b	microcode return stack
PC (PC4, PC3, PC2, PC1)	16b (4b, 4b, 4b, 4b)	memory address pointer; scratch register
AUX[16]	16b	auxiliary PC file
F[8]	4b	nibble data registers
C (CH, CL)	8b (4b,4b)	memory read data
K (KH, KL)	8b (4b,4b)	8b data to/from the I/O bus
ST1	4b	status register 1
ST2	4b	status register 2
ST3	4b	status register 3
ST4	4b	status register 4

IC points at the current microinstruction being executed. Each microinstruction is 20b wide. Every microinstruction takes sixteen 10 MHz clock cycles, or 1.6 usec per microinstruction. The IC can be loaded with a 16b immediate value (i.e., JUMP or CALL); its value can be saved on the next location in the ICSTACK or its value restored from the same; its value can be transferred to/from the PC.

ICSTACK holds return addresses from the microcode subroutine calls. The stack is 16 deep; if the call nesting gets deeper than 16 levels, the ICSTACK pointer just wraps around and overwrites the oldest entry. In order to keep the implementation simple, a subroutine call pushes the current IC on the ICSTACK, and the corresponding subroutine return pops that same value back into the IC. In neither the push nor the pop does the IC address get incremented. The problem with this is that the subroutine call would just be executed again after the subroutine return. However, the subroutine return instruction sets a flag which causes the next microinstruction (which invariably is the subroutine call) to be ignored. This means that effectively a return from subroutine instruction takes two microcycles, or 3.2 usec. This is actually pretty costly since the microcode consists largely of subroutine calls to short routines. Code density was more important than performance.

PC contains a 16b value that can be treated as a 16b register, or four 4b registers named PC4 (ms), PC3, PC2, and PC1 (ls). This register also supplies the memory address when an instruction contains a memory access operation. The address is a nibble address, which is what limits the architecture to accessing at most 32 KB of RAM.

AUX[16] is a file of sixteen 16b registers. These are used for holding and supplying 16b values to the PC. They are required because saving/restoring the PC value to memory takes many microinstructions. When a value is transferred from the PC to an AUX register, the value can be sent directly, or it can be adjusted by +1, +2, -1, or -2. This makes advancing a pointer through memory efficient.

F[8] is a file of eight 4b values. These are used as a scratch pad for holding the results of nibble calculations from the 4b ALU.

C is an 8b register that holds two nibbles from the RAM. Although the machine is nibble wide, memory reads read two nibbles at a time. Interestingly, there are two addressing modes: horizontal and vertical. Roughly, horizontal mode causes RAM nibbles MEM[PC] and MEM[PC xor 0x0001] to be fetched, while vertical mode causes RAM nibbles MEM[PC] and MEM[PC xor 0x0010] to be fetched. Horizontal mode is useful for fetching a sequence of bytes, while vertical mode is useful for fetching corresponding nibbles from two floating point numbers that are 8 bytes apart in memory. Once fetched, the contents of C are used 4b at a time as either CH (high) or CL (low) nibbles. There is another mode bit whereby memory reads access a 2KB ROM instead of RAM. This ROM contains fixed constants, error messages, and tables of keywords. This other ROM is required because, other than immediate constants, the microarchitecture has no way of reading data from the microstore.

K is another 8b register. It is used to send 8b values over the I/O bus or to capture 8b values read from the I/O bus. Microinstructions refer to its two parts as KH (high) and KL (low).

Finally, there are four 4b registers. ST1 and ST3 are collections of ad hoc status/control bits that do things like set the memory read vertical/horizontal mode and detect when I/O operations have completed. ST2 and ST4 are just simple 4b read/write registers that software uses to help guide its operation without having to access main memory.

You can see a very simple block diagram of the file, or a more detailed block diagram.

Microcode Format 1

There are a few different formats for microcode instructions. The Wang 2200 service manual contains a very helpful table of microword encodings, although the table is not 100% correct!

The first format performs 4b ALU operations.

Table 2: Microcode Format 1 -- Register Instructions
ROM Instruction Bits	R19-R15	R14	R13-R10	R9-R8	R7-R4	R3-R0
Instruction Designators	Op Code	X	B Bus Source	Memory	A Bus Source	C Bus Dest.

Table 3: Microcode Format 1 -- Opcode Instruction Types
Function	Mnemonic if A specifies a source	Mnemonic if A is 4b immediate
bitwise OR	OR	ORI
bitwise XOR	XOR	XORI
bitwise AND	AND	ANDI
decimal subtract w/carry	DSC	----
binary add	A	AI
binary add w/carry	AC	ACI
decimal add	DA	DAI
decimal add w/carry	DAC	DACI

All operations have the form:

(dest C) = (source A) <op> (source B).

Most operations can specify that the A source specifier either selects a source or can be treated as a simple 4b immediate value. All of these have a two bit MemOp field that indicate what type of memory operation to perform. See Table 13.

Microcode Format 2

Table 4: Microcode Format 2 -- Unconditional Branch Instructions
ROM Instruction Bits	R19-R16	R15-R12	R11-R8	R7-R4	R3-R0
Instruction Designators	Op Code	Imm

Table 5: Microcode Format 2 -- Unconditional Branch Types
Function	Mnemonic	Branch address
subroutine branch	SB	16b absolute branch
unconditional branch	B	16b absolute branch

Format 2 is used for Unconditional Branch and Subroutine Branch. The SB and B instructions jump to an arbitrary microinstruction anywhere. The SB instruction also pushes the IC of the calling instruction onto the ICSTACK and decrements the ICSTACK pointer afterward. There is nothing to prevent the ICSTACK from overfilling and wrapping around (in fact, it does get used this way).

Microcode Format 3

Format 3 encodes the other branch instructions.

Table 6: Microcode Format 3 -- Conditional Branch Instructions
ROM Instruction Bits	R19-R16	R15-R12	R11-R8	R7-R4	R3-R0
Instruction Designators	Op Code	src B	Imm	src A	Imm

Table 7: Microcode Format 3 -- Conditional Branch Types
Function	Mnemonic	Branch address
branch if (src A) = (src B)	BER	8b branch within page
branch if (src A) != (src B)	BNR	8b branch within page
branch if (imm A & src B) = (imm A) (branch if TRUE)	BT	8b branch within page
branch if (~imm A & src B) = (~imm A) (branch if FALSE)	BF	8b branch within page
branch if (imm A) = (src B)	BEQ	8b branch within page
branch if (imm A) != (src B)	BNE	8b branch within page

Format 3 encodes the other branch instructions, and requires some explanation. For all of these instructions, if the branch condition isn't met, IC increments to the next microinstruction. If the branch is to be taken, the top eight bits of IC are kept and the bottom eight bits of IC are replaced with an immediate constant. Because of this, the microstore can be viewed as 256 "pages" of 256 instructions per page. SB and B can jump within a page or to any other page, while the other branches can only branch within page.

BER and BNR simply compare two operands and branch if they are equal or not. BEQ and BNE are the same, except (src A) is treated as a four bit immediate value. BT means that the "src A" field is interpreted as a 4b constant, and a branch is made if the chosen B operand has a 1 bit at least in every bit position where the immediate A operand has a 1. With this it is possible to test if a given bit is set, or a given collection of bits are set. Likewise, BF make sure that for every 1 bit in the immediate A operand, the corresponding bit of the B operand is zero.

"Mini Instruction" Microcode Formats

The one "missing" Format 1 operation, what would have been decimal subtract with carry immediate, is instead used to implement a collection of "Mini Instructions."

Table 8: Microcode Format 4 -- Mini Instructions
ROM Instruction Bits	R19-R10	R9-R8	R7-R4	R3-R0
Instruction Designators	OpCode	MemOp	src A	R

Table 9: Mini Instructions
Mnemonic	Operation
CIO	I/O bus control
SR	Subroutine Return
TPI	Transfer PC to IC
TIP	Transfer IC to PC
TMP	Transfer Memory size to PC
TA	Transfer Aux[R] to PC
TP	Transfer PC to Aux[R]
TP+1	Transfer PC to Aux[R]; then add 1 to Aux[R]
TP-1	Transfer PC to Aux[R]; then sub 1 from Aux[R]
TP+2	Transfer PC to Aux[R]; then add 2 to Aux[R]
TP-2	Transfer PC to Aux[R]; then sub 2 from Aux[R]
XP	eXchange PC with Aux[R]
XP+1	eXchange PC with Aux[R]; then add 1 to Aux[R]
XP-1	eXchange PC with Aux[R]; then sub 1 from Aux[R]
XP+2	eXchange PC with Aux[R]; then add 2 to Aux[R]
XP-2	eXchange PC with Aux[R]; then sub 2 from Aux[R]

In these instructions, the R field picks one of sixteen Aux register to operate on; CIO, SR, TPI, TIP, and TMP just ignore this field.

All of these instructions have a two bit MemOp field that indicate what type of memory operation to perform (see Table 13). The memory address used is the value of the PC register at the start of the instruction. For write operations, the write data comes from the register specified by the "src A" field.

TMP is used exactly once in the 20K word microprogram. During initialization, this instruction returns the number of nibbles of memory in the system. This value isn't determined through any hardware magic; it is simply a dip switch setting on one of the CPU cards.

CIO is used to send one of three types of control strobes to the I/O bus. "-ABS" causes the contents of the K register to be clocked into an I/O address register and then the active low "Address Bus Strobe" signal is fired for about 5 microseconds. This causes the various I/O cards to select or deselect. Another type of strobe is "-OBS", or "Output Bus (Data) Strobe", which causes the value of the K register to be driven on the I/O bus followed by a ~5 microsecond data strobe signal. Finally, "-CBS" is a control bus strobe; most cards ignore this signal, but some use it like a secondary -OBS. Although there are these timing signals generated by one-shots on the I/O controller card, there are no hardware interlocks to prevent the microcode from initiating an ABS/OBS/CBS strobe before the previous sequence has finished. Instead, after any CIO instruction, the microcode invariably calls a subroutine that kills about 11 microseconds before returning.

Microcode Field Encoding

Next comes the definition of what operands are available to the A and B ALU sources and how the C dest is specified.

Table 10: ALU "A" source encoding
A Field Encoding	Mini Instructions	All Other Instructions
0000 to 0111	F[n]	F[n]	one of eight scratch registers
1000	CH	CH
1001	illegal	CH-	CH; decrement PC1 mod 16
1010	illegal	CH+	CH; increment PC1 mod 16
1011	illegal	-	imm. 0; decrement PC1 mod 16
1100	CL	CL
1101	illegal	CL-	CL; decrement PC1 mod 16
1110	illegal	CL+	CL; increment PC1 mod 16
1111	illegal	+	imm. 0; increment PC1 mod 16

For cases where PC1 is incremented or decremented and there is a memory operation, the update effectively happens after the original PC has been used to address memory.

Table 11: ALU "B" source encoding
B Field encoding	X Bit (R14) = 0	X Bit (R14) = 1
0000 to 0111	F[n]	F[n]
1000	KH I/O Register	Status Register 3
1001	KL I/O Register	Status Register 4
1010	Status Register 1	PC2
1011	Status Register 2	PC3
1100	PC1	PC4
1101	CH Data Register	CH Data Register
1110	CL Data Register	CL Data Register
1111	Immediate 0	Immediate 0

Table 12: ALU "C" destination encoding
C Field encoding	X Bit (R14)=0	X Bit (R14)=1
0000 to 0111	F[n]	F[n]
1000	KH I/O Register	Status Register 3
1001	KL I/O Register	Status Register 4
1010	Status Register 1	PC2
1011	Status Register 2	PC3
1100	PC1	PC4
1101	illegal	illegal
1110	illegal	illegal
1111	ignore results	ignore results

Notice that both the B and C fields might be affected by the "X" bit. This means that certain pairs of source/destination aren't possible. For example, microcode can specify ADD F0,PC2,PC3 (add F[0] to PC2 and store the result in PC3), but ADD F0,PC1,PC3 is not possible since the X bit must be 0 to select PC1 for the B input but the X bit must be 1 to select PC3 for the destination.

Table 13: M Field encoding (Memory operation)
M Field encoding	Meaning
00	no memory operation
01	read 8b of RAM or ROM from the address indicated by PC into the {CH,CL} register
10	write ALU result to RAM at the nibble address indicated by PC
11	write ALU result to RAM at the nibble address indicated by PC ^ (vertical_mode ? 0x10 : 0x01)

For a read operation, the currently selected vertical/horizontal addressing mode bit and the RAM/ROM selection bit (both controlled by STx register bits) are used with the current PC value to read two nibble of data into the { CH, CL } registers. On a write, the vertical/horizontal addressing mode bit combines with the write 1/write 2 mode control and the PC to write one nibble of data; this nibble is selected by the src A operand for the mini-op instructions, otherwise it comes from the ALU output.

Microarchitecture Example Code (link)

The above description gives many details, but they are best understood by looking at real code to see how they work together. The exposition will use real code from the ROM. All comments are mine, made simply through observation of the code; thus, there is a danger that my commentary is inaccurate. The syntax is similar, but not identical, to the original syntax of the Wang microcode assembler.

In the argument field, operands always appear in the order A,B,C. If the source of A or B is the constant zero, it can be specified by a null field. If the ALU result isn't to be stored back into a register, the C field is specified as a null field. For example, ADD,W1 F3,, means that argB is implicitly 0 and argC is the null destination; this instruction is adding 0 to the scratch register F3 which appears on the C result bus; the result isn't saved to a register, but it is the data that is used for the write operation specified by the ",W1" modifier to the opcode.

uCode Example #1: Skip over spaces
IC	ucode	Mnemonic	Behavior
02A1	58C04	TA 4	transfer the contents of AUX[4] to the PC register, wiping out the previous contents of PC
02A2	5A904	TP+2,R 4	transfer PC+2 back to AUX[4]; read the byte at RAM[PC], storing it in C. We increment by two because PC is a nibble address, and we are advancing to the next byte.
02A3	FEA25	BNE 2,CL,02A5	jump to return if low nibble isn't 2
02A4	EDA01	BEQ 0,CH,02A1	loop back if high nibble is 0; (note the nibble swap: this is seeking HEX(20), which is space)
02A5	58400	SR	return to caller

uCode fragment #1 scans a line of code, skipping ahead until a non-space is found. AUX[4] contains the 16b pointer to the current byte being scanned, and returns with C containing the first non-space and AUX[4] pointing to the byte after it. Undoubtedly in the original source code the constant "4" would have been represented by a symbolic name.

uCode Example #2: Code fragment to fill memory with a constant
IC	ucode	Mnemonic	Behavior
03B9	528EA	ANDI 0E,ST1,ST1	clear bit 0 of ST1; this is the carry bit
03BA	680E0	ACI 0E,F0,F0	subtract two from the 16b quantity stored in {F3,F2,F1,F0}
03BB	684F1	ACI 0F,F1,F1
03BC	688F2	ACI 0F,F2,F2
03BD	68CF3	ACI 0F,F3,F3
03BE	DAC14	BF 1,ST1,03C4	test bit 1 of ST1 (carry); if there is no carry, we are done
03BF	5BC01	XP-2	this and the next instruction simply decrement PC by 2 using AUX[1] as a temporary register
03C0	59001	XP 1
03C1	6160F	AI,W1 0,F5,	store {F4,F5} in memory at the byte pointed at by PC
03C2	6130F	AI,W2 0,F4,
03C3	B0B39	B 03B9	loop back to the start of the routine
03C4	58400	SR	return from subroutine

This routine uses {F3,F2,F1,F0} as a 16b count of the number of nibbles to fill with a constant byte. The byte is supplied by {F5,F4}. Note that it takes two write operations to fill a byte. The operations at 03C1 and 03C2 write the two nibbles, encoded as ",W1" and ",W2", represent the nibbles at MEM[PC] and MEM[PC^0x0001], that is, an adjacent nibble pair. The operation for those two instructions, "AI" is "add immediate." Because there is no third argument (note the final comma), the result isn't saved anywhere other than to memory.

The "XP-2 AUX1", "XP AUX1" pair is a fairly common idiom. Although AUX[1] is involved, its value is unaffected. This is done simply for the side effect of adjusting the PC value.

Although the memory interface allows writing one nibble per instruction (at sixteen clocks per instruction), this routine only writes two nibbles per eleven instructions, for 18% efficiency.

Here is one last example, presented as a simple listing with my comments. When the END command is executed, BASIC prints out the number of free bytes left. This is calculated in binary, but must be presented in ASCII. This code converts a 16b binary number in PC to a 20b BCD number in {R3,R2,R1,R0,R6}, then calls a subroutine to print out these BCD numbers as ASCII. You can see that the instruction set isn't very efficient.

The calculated amount is in terms of nibbles, but the printed result is in terms of bytes. That is why the main loop is iterated 15 times, not 16, to effect a divide by 2.

uCode Example #3: Code fragment to print free memory
IC	ucode	Mnemonic	Behavior
10C3	43CD0	ORI 0D,,F0
10C4	43CEC	ORI 0E,,PC1
10C5	47C2A	ORI 2,,PC2
10C6	A1123	SB 1213
10C7	43CB0	ORI 0B,,F0
10C8	47C7A	ORI 7,,PC2
10C9	43C0C	ORI 0,,PC1
10CA	A1123	SB 1213
10CB	58C07	TA 7
10CC	A0538	SB 0358	{R3,R2,R1,R0}=PC
10CD	528EA	ANDI 0E,ST1,ST1
10CE	68832	ACI 3,F2,F2
10CF	68C03	ACI 0,F3,F3
10D0	DAD13	BF 1,ST1,10D3
10D1	528EA	ANDI 0E,ST1,ST1
10D2	B1D05	B 10D5
10D3	58C05	TA 5
10D4	A014F	SB 041F	PC = PC - {R3,R2,R1,R0}
10D5	43C00	ORI 0,,F0
10D6	43C01	ORI 0,,F1
10D7	43C02	ORI 0,,F2
10D8	43C03	ORI 0,,F3
10D9	43C06	ORI 0,,F6	{F3,F2,F1,F0,F6} is a 5 bcd digit number
10DA	DAE1B	BF 1,ST1,10EB
10DB	43CF7	ORI 0F,,F7	step 15 times
10DC	528EA	ANDI 0E,ST1,ST1	clear carry
10DD	39866	DAC F6,F6,F6	double bcd number
10DE	38000	DAC F0,F0,F0
10DF	38411	DAC F1,F1,F1
10E0	38822	DAC F2,F2,F2
10E1	38C33	DAC F3,F3,F3
10E2	A0647	SB 0467	PC = PC*2; st1.1 is msb of old PC
10E3	DAE19	BF 1,ST1,10E9	if no carry, skip next step
10E4	79806	DACI 0,F6,F6	bcd increment by 1 (I'm not sure why we couldn't just loop back to 10DD and let any carry trickle in that way)
10E5	78000	DACI 0,F0,F0
10E6	78401	DACI 0,F1,F1
10E7	78802	DACI 0,F2,F2
10E8	78C03	DACI 0,F3,F3
10E9	61CF7	AI 0F,F7,F7
10EA	F7D0D	BNE 0,F7,10DD
10EC	03C69	OR F6,,KL	get ls digit
10ED	A00B2	SB 0B02	output that 5th digit
10EE	A072E	SB 027E	output 0x0D
10EF	A072E	SB 027E	output 0x0D
10F0	DBF82	BF 8,ST2,10F2
10F1	A0F13	SB 01F3
10F2	B020A	B 002A