# Combinational Logic (Cont.) Sequential Logic CPU Datapath

**CMPT 295 Week 10.2** 

# **Simplifying Boolean Expressions**

- Logic Delay: Everything we are dealing with is just an abstraction of transistors and wires
  - Inputs propagating to the outputs are voltage signals passing through transistor networks
  - There is always some delay before a CL's output updates to reflect the inputs
  - Critical Path is longest delay from any input to output.
     Could be represented as "n gate delays"
- Simpler Boolean expressions 
   → smaller transistor networks 
   → smaller circuit delays 
   → faster hardware

# Simplifying Boolean Expressions: Example

$$y = ab + a + c$$

3

# Karnaugh Maps

- Used to simplify Boolean expressions of 2-4 variables
- Table composed of squares each representing a unique combination of all variable (1 if true, else blank)
- Two variable Map:

$$x^{y} = 0$$
 1
 $0 \quad \overline{x}\overline{y} \quad \overline{x}y$ 
 $1 \quad x\overline{y} \quad xy$ 

Example: Boolean Expression?



## **Three Variable Karnaugh Maps**



Question: Simplify  $\bar{A}C + \bar{A}B + A\bar{B}C + BC$ 

# **Example: Simplify 3-Variable Expression**

Question: Simplify  $\bar{A}C + \bar{A}B + A\bar{B}C + BC$ 



Answer:  $C + \bar{A}B$ 

## Four Variable Karnaugh Maps



Question: Simplify  $\overline{w}y + \overline{w}z + w\overline{x}y + wxyz$ 

## **Example: Simplify 4-Variable Expression**



## **Useful Combinational Circuits**

# Data Multiplexor (MUX)

- Multiplexor ("MUX") is a selector
  - Place one of multiple inputs onto output (N-to-1)
- Shown below is an n-bit 2-to-1 MUX
  - Input S selects between two inputs of n bits each



# Implementing a 1-bit 2-to-1 MUX

## Schematic:



• Truth Table:

| S | а | b | С |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 |

## Boolean Algebra:

$$c = \overline{s}a\overline{b} + \overline{s}ab + s\overline{a}b + sab$$

$$= \overline{s}(a\overline{b} + ab) + s(\overline{a}b + ab)$$

$$= \overline{s}(a(\overline{b} + b)) + s((\overline{a} + a)b)$$

$$= \overline{s}(a(1) + s((1)b))$$

$$= \overline{s}a + sb$$

## Circuit Diagram:



## 1-bit 4-to-1 MUX

• Schematic:



- Truth Table: How many rows? 26
- Boolean Expression:

$$E = \overline{S_1 S_0} A + \overline{S_1 S_0} B + S_1 \overline{S_0} C + S_1 S_0 D$$

# **Another Design for 4-to-1 MUX**

- Can we leverage what we've previously built?
  - Alternative hierarchical approach:



## Decoder

- Enable one of 2<sup>N</sup> outputs based on N input
- Example: 2-to-4 decoder



By BlueJester0101, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3668293

Use case: Choose ALU operation based on instruction op-code

# **Demultiplexer (Demux)**

Similar to decoder with an enable signal



By BlueJester0101, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3668293

# Single-Bit Binary Adder (Half Adder)

- Add A + B to get Sum (S) and Carry (C)
- Truth Table:
- Boolean Expressions:
  - $S = A \oplus B$ ; C = AB
- Circuit:



By inductiveload - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1023090

| Α | В | S | C |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |

Full Adder

## What is this Circuit?



- Q. What's the propagation delay?
  - 3 gate delays (highlighted)
- Q. What does the circuit accomplish?
  - Algebra:

$$S = A \oplus B \oplus C$$
;  $C = AB + C(A \oplus B)$ 



# **Computing with Combinational Circuits**

<u>Definition:</u> A <u>combinational circuit</u> computes a pure function, i.e., its outputs react only based on its inputs. There are no feedback loops and no state information (memory) is maintained.

<u>Theorem:</u> Every Boolean function can be implemented with NAND and NOT. Circuits are modular

## ... a 4-bit ripple carry adder!





## **Functional Unit**

#### Hardware circuits are fixed

- Can't adjust wires / gates while running
- > Build control wires to parametrize its function

#### **Function Unit:**



#### **Function Select:**

| FS   | func  |
|------|-------|
| 0001 | A + B |
| 0010 | A – B |
| 1000 | A * B |
| 0100 | A ^ B |
| 0101 | A + 1 |
| 1101 | В     |
|      |       |

## **Functional Unit: Adder-Subtractor**



- if FS == 0 then S = A + B
- if FS == 1 then
   S = A + B + 1
   = A − B

# Combinational vs. Sequential Logic

- Digital Systems consist of two basic types of circuits:
  - Combinational Logic (CL)
    - Output is a function of the inputs only, not the history of its execution
    - Example: add A, B (ALUs)
  - Sequential Logic (SL)
    - Circuits that "remember" or store information
    - Also called "State Elements"
    - Example: Memory and registers

## **Accumulator Example**

An example of why we would need sequential logic



## Assume:

- Each X value is applied in succession, one per cycle
- The sum since time 1 (cycle) is present on S

# First Try: Does this work?



## No!

- 1) How to control the next iteration of the 'for' loop?
- 2) How do we say: 'S=0'?

## **Second Try: How About This?**



A *Register* is the state element that is used here to hold up the transfer of data to the adder

## **Uses for State Elements**

- Place to store values for some amount of time:
  - Register files (like in RISCV)
  - Memory (caches and main memory)
- Help control flow of information between combinational logic blocks
  - State elements are used to hold up the movement of information at the inputs to combinational logic blocks and allow for orderly passage

## Registers

## Same as registers in assembly:

Small memory storage locations



## First State Element: RS Latch

When R = 1 and  $S = 0 \rightarrow Q$  is 0

When S = 1 and  $R = 0 \rightarrow Q$  is 1

When both S and R are  $0 \rightarrow Q$  stays the same

When both S and R are  $1 \rightarrow$  Undefined



By Napalm Llama - Modification of Wikimedia Commons file R-S.gif (shown below), CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=4845402

## **RS Latch with Enable**

- Only changes state when E = 1.
- Stays the same when E = 0



By Inductiveload - Own Drawing in Inkscape 0.43, Public Domain, https://commons.wikimedia.org/w/index.php?curid=873598

## **D** Latch

- Avoids undefined state of RS Latch when R=S=1
- Q is set to D when E = 1; Q stays the same when E = 0



By Inductiveload - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=6712572

## D Flip-Flop

- Changes state only on falling edge of Clock (i.e., Clock changes from 1 to 0)
- Use Clock to change on rising edge



## Signals and Waveforms: Clocks



- Signals transmitted over wires continuously
- Transmission is effectively instantaneous
  - Implies that any wire only contains one value at any given time

## **Dealing with Waveform Diagrams**

- Easiest to start with CLK on top
  - Solve signal by signal, from inputs to outputs
  - Can only draw the waveform for a signal if all of its input waveforms are drawn
- When does a signal update?
  - A state element updates based on CLK triggers
  - A combinational element updates ANY time ANY of its inputs changes

# **Accumulator 2<sup>nd</sup> Try: How About This?**



## **Register Internals**



- - Output flips and flops between 0 and 1
- Specifically this is a "D-type Flip-Flop"
  - D is "data input", Q is "data output"
  - A group of wires when interpreted as a bit field is called a bus



# **Flip-Flop Timing Behavior**





# Flip-Flop Timing Behavior





# **Flip-Flop Timing Behavior**





# **Flip-Flop Timing Behavior**



# **Accumulator Revisited: Proper Timing**



- Reset signal shown
- In practice X<sub>i</sub> might not arrive to the adder at the same time as S<sub>i-1</sub>
  - S<sub>i</sub> temporarily is wrong, but register always captures correct value
  - In good circuits, instability never happens around rising edge of CLK



# **Timing Terms**

- Clock: Steady square wave that synchronizes system
- Register: Several bits of state that samples on rising edge of Clock (positive edge-triggered); also has RESET
- Setup Time: When input must be stable before Clock trigger
- Hold Time: When input must be stable after Clock trigger
- Clock-to-Q Delay: How long it takes output to change from Clock trigger

# **Digital State Machines**



- compute next state based on (current state, inputs)
- compute outputs based on (current state, inputs)
- Q. What does this imply about the clock period?
  - > clock period must exceed ( $t_{pd}$  of combinational circuit +  $t_{pd}$  of registers) where  $t_{pd}$  is propagational delay

# Waveform Example: RS Latch

| а | b | a NOR b |
|---|---|---------|
| 0 | 0 | 1       |
| 0 | 1 | 0       |
| 1 | 0 | 0       |
| 1 | 1 | 0       |



By Napalm Llama - Modification of Wikimedia Commons file R-S.gif (shown below), CC BY 2.0, https://commons.wikimedia.org/w /index.php?curid=4845402

Q

 $\overline{Q}$ 

R

S

### Waveform Example: D Flip-Flop



By Nolanjshettle at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=40852395

Q

 $\overline{Q}$ 

D



#### **CPU Hardware**

**Goal**: Given an instruction set architecture, construct a machine that reliably executes instructions.

Design choices will influence speed of instructions:

- > Some instructions will be faster than others
- Order of instructions may matter
- Order of memory accesses may matter

"conflicts" or "hazards"

# **Model for Synchronous Systems**



- Combinational logic blocks separated by registers
  - Clock signal connects only to sequential logic elements
  - Feedback is optional depending on application
- How do we ensure proper behavior?
  - How fast can we run our clock?

### **Maximum Clock Frequency**

- What is the max frequency of this circuit?
  - Limited by how much time needed to get correct Next State to Register ( $t_{setup}$  constraint)



#### The Critical Path

- The critical path is the longest delay between any two registers in a circuit
- The clock period must be longer than this critical path, or the signal will not propagate properly to that next register



# How do we go faster?

#### Pipelining!

> Split operation into smaller parts and add a register between each one.

# RISC-V CPU Datapath, Control Intro

# **CPU Design Principles**

- Analyze instruction set → datapath requirements
- Select set of datapath components & establish clock methodology
- 3) Assemble datapath meeting the requirements



- 4) Analyze implementation of each instruction to determine setting of control points that effects the register transfer
- 5) Assemble the control logic
  - Formulate Logic Equations
  - Design Circuits

# **RISC-V Single-Cycle CPU**

- Universal datapath
  - Capable of executing all RISC-V instructions in one cycle each
  - Not all units (hardware) used by all instructions
- 5 Phases of execution
  - IF (Instruction Fetch), ID (Instruction Decode), EX (Execute),
     MEM (Memory), WB (Write Back)
  - Not all instructions are active in all phases (except for loads!)
- Controller specifies how to execute instructions

### RISC-V CPU in two parts

- Central Processing Unit (CPU):
  - Datapath: Contains the hardware necessary to <u>perform</u> operations required by the processor
    - Reacts to what the controller tells it. (i.e., "I was told to do an add, so
       I"Il feed these arguments through an adder)
  - Control: Decides what each piece of the datapath should do
    - What operation am I performing? Do I need to get info from memory? Should I write to a register? Which register?
    - Has to make decisions based on the input instruction only.

# **Design Principles**

- Determining control signals
  - Any time a datapath element has an input that changes behavior, it requires a control signal (e.g. ALU operation, read/write)
  - Any time you need to pass a different input based on the instruction, add a MUX with a control signal as the selector (e.g. next PC, ALU input, register to write to)
- Control signals will change based on exact datapath
- Datapath will change based on ISA

# **Storage Element: Register File**

- Register File consists of 32 registers:
  - Output ports portA and portB
  - Input port portW
- Register selection
  - Place data of register RA (number) onto portA
  - Place data of register RB (number) onto portB
  - Store data on portW into register RW (number) when
     Write Enable is 1
- Clock input (CLK)
  - CLK is passed to all internal registers so they can be written to if they match RW and Write Enable is 1



RW RA

RB

# **Implementing R-Types**





#### Perform operation

- New hardware: ALU (Arithmetic Logic Unit)
  - Abstraction for adders, multipliers, dividers, etc.
  - How do we know what operation to execute?
  - Our first control bit!ALUSel(ect)

Adding addi to datapath



Adding lw to datapath



# **Storage Element: Idealized Memory**

- Memory (idealized)
  - One input port: Data In
  - One output port: Data Out
- Memory access:
  - Read: Write Enable = 0, data at Address is placed on Data Out
  - Write: Write Enable = 1, Data In written to Address
- Clock input (CLK)
  - CLK input is a factor ONLY during write operation
  - During read, behaves as a combinational logic block:
     Address valid → Data Out valid after "access time"



### **Current Datapath**



Adding sw to datapath



Adding branches to datapath



Adding jalr to datapath



- Writes PC+4 to dest (return address)
- Sets PC = base + offset

Adding jal to datapath

imm[20|10:1|11|19:12] rd opcode



- jal saves PC+4 in register rd (the return address)
- Set PC = PC + offset (PC-relative jump)



lui writes the upper 20 bits of the destination with the immediate value, and clears the lower 12 bits



Adds upper immediate value to PC and places result in destination register