The same design team that produced the first generation 2200 also produced the 2nd generation CPU, internally known as the 2600. The microarchitecture document for the 2600 was substantially in place by October, 1974 or earlier. The architecture document 2600 Calculator Structure was authored by Norman Lourie, Bob Kolk, and Bruce Patterson, so they were probably the chief architects. The 2600 CPU was a complete redesign, incorporating the latest technology and a much more efficient microarchitecture. The 2200 MVP architecture document was very well done, leaving little to the imagination.
In the VP microarchitecture, microinstructions could operate on 8b and 16b operands in just 600 ns, whereas the first generation CPU only operated on 4b quantities in 1600 ns. The revised microarchitecture also had a larger AUX register file and a larger subroutine stack. Finally, even though there were only 3 bits more per microword (23b vs 20b), the instruction set was far richer in the new microarchitecture. As an extreme example, loading the PC register (this is the memory pointer, not the instruction pointer) took one instruction (600 ns) in the new microarchitecture versus four instructions (6400 ns) in the old.
Although the microarchitecture retained some of the flavor of the first generation, its differences were great enough that the BASIC interpreter had to be completely rewritten from scratch. Wang BASIC also got a major overhaul with many new features and was dubbed BASIC-2.
Bruce Patterson and Dave Angel wrote almost all the microcode for BASIC-2. Despite the complete rewrite and all the new features, BASIC-2 was 99% upwardly compatible with the original Wang BASIC. A BASIC program running on a 2600 CPU is about 8x faster than the exact same program running on a 2200T CPU; a factor of 2.5 of that was due to the faster cycle time of the machine, and the other factor of three came from the more powerful microarchitecture instruction set combined with more efficient algorithms.
Page 4 of Wang Systems Newsletter #4 has this comparison:
Q. How much faster is the "VP" than the "T" CPU?
A. That's a good question. In general, one can safely state that the VP is 6-8 times faster overall. To help compare the two CPU's, here are some timings against specific functions.
Function 2200VP 2200T X+Y 0.11 ms 0.8 ms X*Y 0.38 ms 3.9 ms X/Y 0.76 ms 7.4 ms X^Y 6.2 ms 45.4 ms LOG 3.2 ms 23.2 ms SQR 1.7 ms 46.4 ms TAN 7.7 ms 78.5 ms RND 0.27 ms 24.0 ms
One great improvement in the 2600 CPU was that the microcode was no longer stored in ROMs -- it was downloaded from disk on start up, making it much easier to fix bugs in the field. This feature also made it possible to run diagnostics on the machine every so often to make sure the hardware was operating right.
Although the CPU microarchitecture was entirely incompatible, the I/O structure was kept from the first generation 2200, allowing people to upgrade to the VP without having to throw away all of the their I/O cards and peripherals.
Microarchitecture Details (link)
The following information is intended to give the flavor of the microarchitecture, but doesn't cover everything. The view of the CPU presented to the microprogrammer is as follows.
Register name [array size] | Register width | Function |
---|---|---|
IC | 16b | microcode instruction counter |
ICSTACK[96] | 16b | microcode return stack |
PH, PL | 16b (8b, 8b) | memory address pointer; scratch register |
AUX[32] | 16b | auxiliary PC file |
F[8] | 8b | scratch data registers |
CH, CL | 16b (8b,8b) | memory read data |
K | 8b | 8b data to/from the I/O bus |
SH | 8b | high status register |
SL | 8b | low status register |
IC points at the current microinstruction being executed. Each microinstruction is 24b wide, of which one is parity. Most microinstructions take six 10 MHz clock cycles, although a few take eight, eleven, or sixteen clocks. The IC can be loaded with a 16b immediate value (i.e., JUMP or CALL); its value can be saved on the next location in the ICSTACK or its value restored from the same.
ICSTACK holds return addresses from the microcode subroutine calls, and it can also be used to push the current PC (with a -3 to +3 offset) or to pop the newest value into the PC . The stack is 96 deep; if the call nesting gets deeper than 96 levels, the ICSTACK pointer just wraps around and overwrites the oldest entry.
PH, PL are respectively the high and low bytes of the 16b PC register. PC supplies the memory address when an instruction contains a memory access operation. The address is a byte address, which is what limits the architecture to accessing at most 64 KB of RAM. Later versions of the CPU added bank address bits (provided from SL) allowing more RAM to be addressed, although a single process never saw more than 64 KB. The register is often used like an accumulator to generate addresses that get stored elsewhere.
AUX[32] is a file of thirty two 16b registers. These are used for holding and supplying 16b values to the PC. They are required because saving/restoring the PC value to memory takes many microinstructions. When a value is transferred from the PC to an AUX register, the value can be adjusted by -3 to +3. This makes advancing a pointer through memory efficient.
F[8] is a file of eight 8b values. These are used as a scratch pad for holding the results of calculations from the ALU.
CH,CL are a pair of 8b registers that work together. Every memory read gets two bytes and the data is saved in CH,CL. Because PC is byte addressed, PC may be even or odd. The byte address by PC is saved in CH; the byte addressed by (PC^0x0001) is saved in CL.
K is another 8b register. It is used to send 8b values over the I/O bus or to capture 8b values read from the I/O bus.
Finally, there are two 8b status registers. SH contains a collection of ad hoc status/control bits that do things hold the carry flag and detect when I/O operations have completed. SL is just an 8b read/write register that the microcode uses for various state control so it doesn't have to go to memory for this state.
You can see a very simple block diagram of the microarchitecture.
Microinstruction Encoding (link)
There are a few different formats for microcode instructions. The 2200 MVP architecture document contains a wealth of information, including everything required to write the VP CPU emulation code. Because it is so well written, if you really want the details, see the source document. Below are some of the most important details, enough to provide an overview of what the microarchitecture was all about.
The software development manual contains a very helpful table of microword encodings. It has been recreated as an HTML table below.
22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I. REGISTER INSTRUCTIONS | OPCODE | X | Carry | DD | C-BUS | A-BUS | B-BUS | |||||||||||||||||
OR | Or | 0 | 0 | 0 | 0 | 0 | X | 0 | CaCa | DD | CCCC | AAAA | BBBB | |||||||||||
XOR | Exclusive | 0 | 0 | 0 | 0 | 1 | X | 0 | CaCa | DD | CCCC | AAAA | BBBB | |||||||||||
AND | And | 0 | 0 | 0 | 1 | 0 | X | 0 | CaCa | DD | CCCC | AAAA | BBBB | |||||||||||
SC | Binary Subtract with Carry | 0 | 0 | 0 | 1 | 1 | X | 0 | CaCa | DD | CCCC | AAAA | BBBB | |||||||||||
DAC | Decimal Add with Carry | 0 | 0 | 1 | 0 | 0 | X | 0 | CaCa | DD | CCCC | AAAA | BBBB | |||||||||||
DSC | Decimal Subtract with Carry | 0 | 0 | 1 | 0 | 1 | X | 0 | CaCa | DD | CCCC | AAAA | BBBB | |||||||||||
AC | Binary Add with Carry | 0 | 0 | 1 | 1 | 0 | X | 0 | CaCa | DD | CCCC | AAAA | BBBB | |||||||||||
M | Binary Multiply | 0 | 0 | 1 | 1 | 1 | X | 0 | HbHa | DD | CCCC | AAAA | BBBB | |||||||||||
SHFT | Shift | 0 | 0 | 0 | HbHa | X | 0 | 0 | 1 | DD | CCCC | AAAA | BBBB | |||||||||||
II. IMMEDIATE REGISTER INSTRUCTIONS | OPCODE | IMMEDIATE (HIGH) |
DD | C-BUS | IMMEDIATE (LOW) |
B-BUS | ||||||||||||||||||
ORI | Or Immediate | 0 | 1 | 0 | 0 | 0 | IIII | DD | CCCC | IIII | BBBB | |||||||||||||
XORI | Exclusive Or Immediate | 0 | 1 | 0 | 0 | 1 | IIII | DD | CCCC | IIII | BBBB | |||||||||||||
ANDI | And Immediate | 0 | 1 | 0 | 1 | 0 | IIII | DD | CCCC | IIII | BBBB | |||||||||||||
AI | Binary Add Immediate | 0 | 1 | 0 | 1 | 1 | IIII | DD | CCCC | IIII | BBBB | |||||||||||||
DACI | Decimal Add with Carry Immediate | 0 | 1 | 1 | 0 | 0 | IIII | DD | CCCC | IIII | BBBB | |||||||||||||
DSCI | Decimal Subtract with Carry Immediate | 0 | 1 | 1 | 0 | 1 | IIII | DD | CCCC | IIII | BBBB | |||||||||||||
ACI | Binary Add with Carry Immediate | 0 | 1 | 1 | 1 | 0 | IIII | DD | CCCC | IIII | BBBB | |||||||||||||
MI | Binary Multiply Immediate | 0 | 1 | 1 | 1 | 0 | 0 | - | Hb | - | DD | CCCC | IIII | BBBB | ||||||||||
III. MINI INSTRUCTIONS | OPCODE | DD | B-BUS | |||||||||||||||||||||
TAP | Transfer Aux to PC's | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | - | DD | 0 | - - | AxAxAxAxAx | BBBB | |||||||||
TPA | Transfer PC's to Aux | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | +/- | DD | 0 | InIn | AxAxAxAxAx | BBBB | |||||||||
XPA | Exchange PC's to Aux | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | +/- | DD | 0 | InIn | AxAxAxAxAx | BBBB | |||||||||
TPS | Transfer PC's to Stack | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | +/- | DD | 0 | InIn | - - - - - | BBBB | |||||||||
TSP | Transfer Stack to PC's | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | - | DD | - - - - - - - - | BBBB | |||||||||||
SR,RCM | Read Control Memory + SR | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | - | - - | 0 | 1 | 1 | - - - - - | - - - - | ||||||||
SR,WCM | Write Control Memory + SR | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | - | - - | 0 | 1 | 0 | - - - - - | - - - - | ||||||||
SR | Subroutine Return | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | - | DD | 0 | 0 | - - - - - - | BBBB | |||||||||
CIO | Control Input/Output | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | - | 0 | 0 | S | TTT TTTT | - - - - | |||||||||
LPI | Load PC's Immediate | 0 | 0 | 1 | 1 | II | 1 | II | DD | IIII IIII IIII | ||||||||||||||
IV. MASK BRANCH INSTRUCTIONS | OPCODE | BRANCH FIELD (LOW 10-Bits) |
MASK | B-BUS | ||||||||||||||||||||
BT | Branch if True | 1 | 1 | 0 | 0 | Hb | RRRRRRRRRR | MMMM | BBBB | |||||||||||||||
BF | Branch if False | 1 | 1 | 0 | 1 | Hb | RRRRRRRRRR | MMMM | BBBB | |||||||||||||||
BEQ | Branch if = Mask | 1 | 1 | 1 | 0 | Hb | RRRRRRRRRR | MMMM | BBBB | |||||||||||||||
BNE | Branch if != Mask | 1 | 1 | 1 | 1 | Hb | RRRRRRRRRR | MMMM | BBBB | |||||||||||||||
V. REGISTER BRANCH INSTRUCTIONS | OPCODE | BRANCH FIELD (LOW 10-Bits) |
A-BUS | B-BUS | ||||||||||||||||||||
BLR | Branch if < Register | 1 | 0 | 0 | 0 | X | RRRRRRRRRR | AAAA | BBBB | |||||||||||||||
BLER | Branch if <= Register | 1 | 0 | 0 | 1 | X | RRRRRRRRRR | AAAA | BBBB | |||||||||||||||
BER | Branch if = Register | 1 | 0 | 1 | 0 | 0 | RRRRRRRRRR | AAAA | BBBB | |||||||||||||||
BNR | Branch if != Register | 1 | 0 | 1 | 1 | 0 | RRRRRRRRRR | AAAA | BBBB | |||||||||||||||
VI. BRANCH INSTRUCTIONS | OPCODE | BRANCH FIELD (LOW 10-Bits) |
BRANCH FIELD (HIGH 6-Bits) |
|||||||||||||||||||||
SB | Subroutine Branch | 1 | 0 | 1 | 0 | 1 | RRRRRRRRRR | RRRRRR | - - | |||||||||||||||
B | Unconditional Branch | 1 | 0 | 1 | 1 | 1 | RRRRRRRRRR | RRRRRR | - - |
AAAA | A-BUS Register Address |
---|---|
BBBB | B-BUS Register Address |
CCCC | C-BUS Register Address |
DD | Read/Write Specification 00 = no read/write 01 = read (CH<=MEM[PC]; CL<=MEM[PC^1]) 10 = write 1 (MEM[PC] <= C-BUS result) 11 = write 2 (MEM[PC^1] <= C-BUS result) |
Hb, Ha | High/Low 4-bits of register Ha = 0: select low 4-bits of A-Bus register Ha = 1: select high 4-bits of A-Bus register Hb = 0: select low 4-bits of B-Bus register Hb = 1: select high 4-bits of B-Bus register |
II...I | Immediate Operand |
MMMM | Immediate Mask |
AxAxAxAxAx | Address of auxiliary register |
+/- In In | Increment/decrement specification
000 = PC's 001 = PC's + 1 010 = PC's + 2 011 = PC's + 3 100 = PC's 101 = PC's - 1 110 = PC's - 2 111 = PC's - 3 |
CaCa | Set carry (SH0) specification 00 = do not set carry 10 = set carry to 0 before ALU operation 11 = set carry to 1 before ALU operation |
X | Extended operation if X = 1 |
RR...R | Branch address |
S | Set IOB flip-flops if S = 1 |
TTTTTT | Strobe specification |
- | Bit ignored (0 or 1 legal) |
Binary Encoding | A-BUS | B-BUS | C-BUS |
---|---|---|---|
0000-0111 | File registers (F0-F7) | F0-F7 | F0-F7 |
1000 | CL with PC's = PC's - 1 | PL | PL |
1001 | CH with PC's = PC's - 1 | PH | PH |
1010 | CL | CL | illegal |
1011 | CH | CH | illegal |
1100 | CL with PC's = PC's + 1 | SL | SL |
1101 | CH with PC's = PC's + 1 | SH | SH |
1110 | Dummy with PC's = PC's + 1 | K | K |
1111 | Dummy with PC's = PC's - 1 | Dummy | Dummy |
When the A-BUS or B-BUS is specified as Dummy, a constant zero is supplied. When the C-BUS is specified as Dummy, it means the ALU result won't be stored to a register (although the result can still be stored to memory with a ",W1" or ",W2" specifier, if the microinstruction format has the DD field).
Binary Encoding | A-BUS | B-BUS | C-BUS |
---|---|---|---|
0000 | F1, F0 | F1, F0 | F1, F0 |
0001 | F2, F1 | F2, F1 | F2, F1 |
0010 | F3, F2 | F3, F2 | F3, F2 |
0011 | F4, F3 | F4, F3 | F4, F3 |
0100 | F5, F4 | F5, F4 | F5, F4 |
0101 | F6, F5 | F6, F5 | F6, F5 |
0110 | F7, F6 | F7, F6 | F7, F6 |
0111 | CL, F7 | PL, F7 | PL, F7 |
1000 | CH, CL | PH, PL | PH, PL |
1001 | CL, CH | CL, PH | illegal |
1010 | CH, CL | CH, CL | illegal |
1011 | CL, CH | SL, CH | illegal |
1100 | CH, CL | SH, SL | SH, SL |
1101 | Dummy, CH | K, SH | K, SH |
1110 | Dummy, Dummy | Dummy, K | Dummy, K |
1111 | F0, Dummy | F0, Dummy | F0, Dummy |
When a microinstruction has an X bit, X=0 means that an 8b operation is to be performed. When X=1, the instruction is converted into a 16b operation, where the first 8b acts on the registers as specified in the encoding, and the second half acts on the 8b operands selected by the register encoding + 1. Table 5 specifies the possible combinations. Note that the operation is a true 16b operation, not two 8b operations in a row, that is, if the CaCa field indicates that carry is to be set or cleared, it happens before the first byte operation but not the second byte operation; for the 16b versions of BLR and BLER, the comparison is a 16b comparison, not just the top byte of the compare. When an extended microinstruction takes place, the increment and decrement of the PC's that would occur for the 8b version is suppressed and the PC value is unaffected. Extended mode instructions that specify a write to memory, only the high order byte of the result is written. Note that extended mode instructions operate in the same amount of time as a normal mode instruction.
Finally, there are some pseudo-operations that the assembler supported. There are more than one way to achieve the same purpose, but the ones chosen by the assembler are as follows:
Mnemonic | Actual Code | Meaning |
---|---|---|
NOP | ORI 0,, | Don't do anything (C-BUS gets zero) |
MVI imm, dst | ORI imm,,dst | Move 8b immediate to register |
MV src, dst | ORI 0,src,dst | 8b register to register move |
MVX src, dst | ORX 00,src,dst | 16b register to register move |
Microarchitecture Example Code (link)
The above description gives many details, but they are best understood by looking at real code to see how they work together. In order to compare the VP microarchitecture to that of the 2200T CPU, I've attempted to re-write the code examples from the 2200 microarchitecture page (which was real microcode from a shipping CPU). Because I haven't tried to find the exact same code buried somewhere in BASIC-2, I've just written it myself; perhaps a more experienced VP microcoder could do a better job.
IC | Mnemonic | Behavior |
---|---|---|
02A1 | TA 4 | transfer the contents of AUX[4] to the PC register, wiping out the previous contents of PC |
02A2 | TP+2,R 4 | transfer PC+2 back to AUX[4]; read the byte at RAM[PC], storing it in C. We increment by two because PC is a nibble address, and we are advancing to the next byte. |
02A3 | BNE 2,CL,02A5 | jump to return if low nibble isn't 2 |
02A4 | BEQ 0,CH,02A1 | loop back if high nibble is 0; (note the nibble swap: this is seeking HEX(20), which is space) |
02A5 | SR | return to caller |
IC | Mnemonic | Behavior |
---|---|---|
0100 | MVI 20,F0 | space character |
0101 | TAP 4 | transfer the contents of AUX[4] to the PC register, wiping out the previous contents of PC |
0102 | OR,R +,, | read RAM[PC] and save it in CH; increment PC |
0103 | BER CH,F0,0102 | if the character is a space, get the next character |
0104 | TPA 4 | CH still holds the first non-space character; AUX[4] points to the following byte |
0105 | SR | return to caller |
uCode fragment #1 scans a line of code, skipping ahead until a non-space is found. AUX[4] contains the 16b pointer to the current byte being scanned, and returns with C containing the first non-space and AUX[4] pointing to the byte after it. Undoubtedly in the original source code the constant "4" would have been represented by a symbolic name.
The 2200T code takes four instructions (6.4 uS) per byte processed; the 2200VP code takes two instructions per byte (1.2 uS), which is about a five times speed difference. To be fair, the 2200VP code is one instruction longer and uses F0 as a scratch register.
IC | Mnemonic | Behavior |
---|---|---|
03B9 | ANDI 0E,ST1,ST1 | clear bit 0 of ST1; this is the carry bit |
03BA | ACI 0E,F0,F0 | subtract two from the 16b quantity stored in {F3,F2,F1,F0} |
03BB | ACI 0F,F1,F1 | |
03BC | ACI 0F,F2,F2 | |
03BD | ACI 0F,F3,F3 | |
03BE | BF 1,ST1,03C4 | test bit 1 of ST1 (carry); if there is no carry, we are done |
03BF | XP-2 1 | this and the next instruction simply decrement PC by 2 using AUX[1] as a temporary register |
03C0 | XP 1 | |
03C1 | AI,W1 0,F5, | store {F4,F5} in memory at the byte pointed at by PC |
03C2 | AI,W2 0,F4, | |
03C3 | B 03B9 | loop back to the start of the routine |
03C4 | SR | return from subroutine |
This routine uses {F3,F2,F1,F0} as a 16b count of the number of nibbles to fill with a constant byte. The byte is supplied by {F5,F4}. The fill proceeds backwards, that is {F3,F2,F1,F0} initially points to one byte past where the fill should begin. This code takes 11 instructions (17.1 usec) per byte filled.
IC | Mnemonic | Behavior |
---|---|---|
0100 | SCX,0 F3F2,F3F2,F3F2 | subtract {F3,F2} from itself with borrow, so that {F3,F2} = -1 |
0101 | ANDI 0FE,SH,SH | clear the carry bit |
0102 | ACX F1F0,F3F2,F1F0 | {F1,F0} = {F1,F0} + {F3,F2} |
0103 | BFL 1,SH,03C4 | test carry bit; if there is no carry, we are done |
0104 | OR -,, | decrement PC by 1 |
0105 | ORI,W1 0,F4, | store F4 in memory at the byte pointed at by PC |
0106 | B 0101 | loop back to the start of the routine |
0107 | SR | return from subroutine |
In the VP version, things are changed a bit. Because the registers are 8b wide, let's assume {F1,F0} contains a byte count, and that F4 contains the fill byte. This code takes six instructions (3.6 usec) per byte filled, about five times faster. Allowing a couple more instructions, the VP code could be brought down to five instructions per byte. Allowing more extensive rearrangement, the inner loop could be brought down to two instructions:
IC | Mnemonic | Behavior |
---|---|---|
0100 | ORI,W1 -,F4, | write F4 to MEM[PC]; PC=PC-1 |
0101 | BLERX F1F0,PHPL,*-1 | keep going while {F1,F0} <= PC |