Processor Implementation
Fixed Purpose Processors
- Digital circuits designed to implement a specific application, when fabricated so silicon, are Application Specific Integrated Circuits (ASICs).
- The alternative is creating FPGA bitstreams and loading them into FPGAs
- Changing the function of an FPGA is easy, creating new ASICs is expensive.
Custom datapaths for specific applications have the benefit of high performance due to being tailored for the use case, and being able to exploit parallelism. When repeating the same computation on a stream of data, a simple feed forward datapath is most performant, and can be pipelined to improve throughput
The example below shows a feed-forward data path for multiplying two complex numbers, with six pipeline stages.
Finite Impulse Response (FIR) filters are also easy to map to hardware. The delay blocks are just registers, and the arithmetic blocks are implemented directly. Using the transpose form shortens the critical path to improve performance further.
General Purpose Processors
General purpose processors need to support:
- A set of arithmetic operations
- Movement of data in and out of arithmetic logic
- A way of breaking down functions into discrete steps
- A way to program the circuit to carry out the steps
Each of these components can be constructed in Verilog using basic synchronous elements.
Program Counter
Just a register with an input and output (32 bits).
module pc_reg(input clk, rst, input [31:0] pcnext, output reg [31:0] pc);
always @ (posedge clk) begin
if (rst) //point to base address on reset
pc <= 32'd0;
else
ps <= pcnext;
end
endmodule
Register File
The register file constains 32 32-bit registers, and has two read ports.
- Two read address, one for each port (
ra1
,ra2
) - A write address (
wa3
) - A write data input (
wd3
) - Two read outputs (
rd1
,rd2
) - A write enable input (
we3
)
module regfile (input clk, we3,
input [4:0] ra1, ra2, wa3,
input [31:0] wd3,
output [31:0] rd1, rd2);
reg [31:0] rf [0:31];
always @ (posedge clk) begin
if(we3) rf[wa3] <= wd3;
end
assign rd1 = (ra1 != 32’d0) ? rf[ra1] : 0;
assign rd2 = (ra2 != 32’d0) ? rf[ra2] : 0;
endmodule
RAM
- Standard memory with one read and one write port
- Reads are combinational and writes synchronous
module dmem (input clk, we,
input [31:0] ad, wd,
output [31:0] rd);
reg [31:0] ram [0:65535];
// byte-addressing to word-aligned
always @ (posedge clk)
if(we) ram[ad[31:2]] <= wd;
assign rd = ram[ad[31:2]];
endmodule
Combinational elements
There are other combinational elements in the processor, multiplexers, incrementers, sign extension, etc, all of which are fairly easy to implement. The ALU may be more complex, but a simple example of one is shown below, which supports 8 different functions, selected using a function control input F[2:0]
.
module alu (input [31:0] a,b, input [2:0] func,
output reg [31:0] out);
wire [31:0] bfin = func[2] ? ~b : b;
wire [31:0] sumout = a + bfin + func[2];
always @ *
case (func[1:0])
2'b00: out = a & bfin;
2'b01: out = a | bfin;
2'b10: out = sumout;
2'b11: out = sumout[31];
endcase
endmodule
Processor control
The processor also has a control unit, which asserts signals to inform the datapath for the processing of a particular instruction. The control unit uses combinational logic to break down the instruction and then output signals to control the rest of the processor
Pipelining
A pipeline processor requires register stages to be added within the data and control paths.