The MCPU-1: My Second Complete CPU Design
Way back when in early 2017, I demonstrated my first complete CPU and virtual machine that I had programmed in C, which I named the MCPU. It was a 16-bit RISC machine with a lot of peculiarities, such as a fixed-size 16 word call stack and native support for 16 pages of memory, each which could be assigned to different CPU functions. In spite of its retrospective oddity, it impressed the judges enough to win first place in the computer science category.
Four years later, now with more experience under my belt, I decided to go back and design another iteration of the MCPU with a more conventional architecture. Unlike the first MCPU and my other experimental designs that never fully reached completion, the MCPU-1 would be designed in such a way that it could actually be built with 74-series logic, if I so desired. This meant that all instructions should be feasible to implement with a relatively small number of ICs.
Data and Instruction Format
The MCPU-1 is an 8-bit machine with a 16-bit address space. This means that all operations on data in hardware are 8-bit, while the CPU can address a full 16-bit memory space of 65536 bytes.
Each instruction in the MCPU-1 consists of a minimum of a 5-bit opcode along with an additional 3-bit value that can serve as either a register address or a jump condition, depending on the instruction. This means that there can be a total of 32 possible instructions. For instructions that also need to specify a memory address, the MCPU-1 will also fetch the next two successive bytes in memory and use them for instruction execution. For the LDI
instruction, one additional byte is fetched that serves as the immediate value to load a register with. Data in the MCPU-1 is stored in a little endian format, meaning that the least significant byte of a multi-byte data entry comes first in memory.
The ALU
The MCPU-1’s Arithmetic Logic Unit (ALU) can supports 7 total operations. There are four arithmetic operations: add (ADD
), subtract (SUB
), add with carry (ADC
), and subtract with borrow (SBB
). Additionally, there are three logic operations: logical AND (AND
), logical OR (IOR
), and logical NOT (NOT
). The combination of these three instructions can be used to carry out any useful logical operation. The ADC and SBB instructions can be used to perform operations on data that takes up more than one byte of memory, such as numbers that lie outside the range of -128 to 127 or 0 to 255.
There are three ALU flag bits that are set after every ALU operation. All flag values default to 0 unless their conditions are met. Zero (Z
) is set to 1 if the result of an ALU operation is 0. Negative/sign (N
) is set to 1 if the most significant bit of an ALU result is a 1. Carry (C
) is set to 1 if the result of an ALU operation is larger than what can be stored in a single byte.
Registers
The MCPU-1 is a load/store machine. This means that all arithmetic and logic operations must happen between registers. Therefore, it is not possible to directly add a register and a memory location, for example. Due to this design decision, this means that the MCPU-1 must have a somewhat sizable amount of data registers. In this case, there are 8 data registers labeled A
through H
that can be used for general arithmetic and other operations that will be elaborated on later. All instructions that operate on registers specify a single register to operate on, with all other registers required for the instruction being implicit. For all arithmetic and logical operations, the first operand and the result of any ALU operation is stored in the A
data register, also known as the accumulator. This is done in order to save instruction space. Because there are only 32 opcodes, this is necessary to avoid allocating too many opcodes to ALU instructions.
In addition to the 8 data registers, there is a pair of registers for I/O operations labeled as IN
and OUT
. The STI
and LDO
instructions serve to store data from the IN
register into a data register or load data from a data register to the OUT
register. After a read, the IN
register is cleared to zero in order to avoid repeat reads of the same value. Likewise, output devices are expected to clear OUT
after each read. The MCPU-1 has no hardware interrupt support, meaning that all I/O operations must be done through polling.
Finally, there are two 16-bit registers: the program counter (PC
), and the stack pointer (SP
). The PC
points to the next instruction location in memory and is not directly accessible to the programmer. The SP
points to the next stack location to be pushed to or popped from, and can be set by the programmer through the LSP
instruction.
The Stack
The MCPU-1 uses a single hardware stack pointed to by SP
for both address and data storage. The stack grows upward in memory, meaning that a PSH
operation increments the SP
, while a POP
operation decrements the SP
. The primary use of the stack is to save the current state of the program whenever a subroutine is called. The SRD
and SRX
instructions automatically push the current value of the PC
to the stack and the RET
instruction automatically pops the top stack value into the PC
. Care should be taken that all PSH
and POP
operations in a subroutine are matched in order to prevent erroneous return addresses when RET
is called.
In order to effectively use the stack, a programmer must first call LSP
. This is necessary since the PC
‘s first read location is the address 0x0000
, which is also the initial value of the SP
.
Addressing Modes
The MCPU-1 has three addressing modes: immediate, direct, and indexed. The immediate addressing mode is used only by the LDI
instruction, and loads the next value in memory after the opcode byte into the specified register. The direct addressing mode is used by all load, store, and jump instructions ending with -D, as well as the LSP
instruction. The direct mode takes the next two bytes in memory following the opcode and uses them as the address for the instruction. This is typically used to access programmer-defined locations in memory such as subroutines and single address locations for data storage. Finally, the indexed addressing mode is used by all load, store, and jump instructions ending with -X. The indexed mode uses the data stored in the G
and H
data registers to form the address for the instruction. The benefit of obtaining the address from the register space instead of the CPU’s ROM is that it can be dynamically changed by the program as it runs. This is useful for accessing things such as arrays, or for dynamically determining which subroutine to take.
Control Flow
The MCPU-1 has five total instructions that can change the control flow of a program. The JPD
and JPX
instructions are used to load the PC
with a new address, therefore disrupting its normal order of execution. The SRD
and SRX
function almost identically, but they also automatically push the current PC
state to the stack. These four instructions can also use the ALU’s flag bits in order to perform conditional branches if the right conditions are met. In this case, the three bits normally used for register addressing serve to store the condition code for the instruction. If a given bit in the condition code is a 1, then the CPU will only execute the branch if its corresponding ALU flag is also 1. If the condition code sets multiple bits to 1, then the jump will be evaluated like a logical OR, meaning that the jump will be taken if any of the desired ALU flags are set.
The RET
instruction can be viewed as the inverse of the SRD
or the SRX
instruction in that it pops the top two bytes of the stack into the PC
, thereby effectively restoring the CPU to the state it was in before the subroutine was called. RET
must be called at the end of every subroutine in order to free the stack for new data. Failure to do so will result in improper program flow and potentially a stack overflow.
Full Instruction Set
The full instruction set of the MCPU-1 is shown below:
C++ Implementation
In order to verify the usefulness of this design and test code before running it on a hardware version, if I ever choose to build such a thing, I needed to program a simulator for the MCPU-1. C++ was the language I chose to use to program it, although the source code is almost directly portable to C as well, save for some I/O that uses C++ specific constructs. The actual code for the simulator is quite simple, taking up just 350 lines in total. Its main structure emulates the fetch/decode/execute cycle of the real CPU by fetching the opcode for the next instruction before interpreting it in a switch statement and performing the required operations. The simulator can be configured to output in either decimal, ASCII, or hexadecimal mode in a config file. Additionally, the config specifies the ROM/RAM split for the simulator, which also affects how much available program memory it has. I set this to 32768 for a 50/50 split, although most practical programs will not require more than a few kilobytes of RAM.
Python Assembler
Now that I had a complete virtual implementation of the MCPU-1, I coded some simple test programs using a hex editor. This was fine for testing, but it would be very cumbersome to write a whole program this way if it were any more complex than adding a couple of numbers together. To make the CPU practical to program for, I needed to write an assembler. I used Python to do this as it was easier to perform the string manipulation needed to convert assembly into machine code with a higher level language.
The assembler works by making two passes through the asm file. The first pass simply counts the number of bytes that each instruction occupies in memory and assigns an address to each label based on this. This is the most crucial step in making the assembler functional as it allows us to simply write code without explicitly defining the address of each subroutine or data location. Instead, we can simply place a label before a section that we want to reference elsewhere in the code and the assembler will automatically assign an address to it at assembly time.
After the first pass to assign labels, the assembler then reads each instruction sequentially and writes its corresponding machine code data into the ROM output file. If the assembler encounters and undefined instruction, incorrect format, or out of bounds data, it will automatically halt assembly. Finally, after assembly is complete, the entire hex data of the output file is written to the terminal so that the programmer can visually expect that everything is ok with the output. The assembly file and assembler output of a program designed to print a string is shown below:
jpd 0 start
label string 0x3
"daisy daisy, give me your answer do\n"
label halt
hcf
label print
ldi b 1
ldi c 0
ldx a
label loop
ldo a ;output char
mvh a ;move lsb of index to a
add b ;add 1
mva h ;move back
ldx a ;load next char
add c ;add to 0 to test
jpd z halt ;halt if null terminated
jpd 0 loop ;loop if not
label start
ldi g 0
ldi h 0x3 ;load index register with start of string
jpd 0 print ;go to print
Finally, we run the assembled machine code on the simulator: