# Comp 790-184: Hardware Security and Side-Channels

# **Lecture 3: Transient Execution Attacks**

January 30, 2025 Andrew Kwong



THE UNIVERSITY of NORTH CAROLINA at CHAPEL HILL





- What are transient execution attacks?
- How does Meltdown work?
  - We will connect the dots between a hardware optimization and a software optimization.
- How do Spectre and its variations work?
  - Let's try to see through these variations and understand the fundamental problem.

Slides adapted from Mengjia Yan (shd.mit.edu)

#### Impact



Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and

Spectre Attacks: Exploiting Speculative Execution.

Yuval Yarom:

IEEE Symposium on Security and Privacy (S&P), 2019

Last visited: Jan-2024 Paper: DOI

Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg:

1644 cites at Google Scholar 1413% above average of year Last visited: Jan-2024 Paper: DOI

55

# **Meltdown**



#### **Meltdown**



# **Meltdown Root Causes**

- Due to the combination of both a hardware and software optimization
  - Out of order execution
  - Mapping kernel memory into user space

### **Recap: 5-stage Pipeline**



#### **Recap: 5-stage Pipeline**

# • In-order execution:

- Execute instructions according to the program order
- What is the ideal instruction throughput? -- instruction per cycle (IPC)

| time         | tO     | t1     | t2     | t3              | t4              | t5     | t6              | .t7    |                 |
|--------------|--------|--------|--------|-----------------|-----------------|--------|-----------------|--------|-----------------|
| instruction1 | $IF_1$ | $ID_1$ | $EX_1$ | $MA_1$          | $WB_1$          |        |                 |        |                 |
| instruction2 |        | $IF_2$ | $ID_2$ | $\mathbf{EX}_2$ | $MA_2$          | $WB_2$ |                 |        |                 |
| instruction3 |        |        | $IF_3$ | $ID_3$          | $\mathbf{EX}_3$ | $MA_3$ | WB <sub>3</sub> |        |                 |
| instruction4 |        |        |        | $IF_4$          | $ID_4$          | $EX_4$ | $MA_4$          | $WB_4$ |                 |
| instruction5 |        |        |        |                 | $IF_5$          | $ID_5$ | $EX_5$          | $MA_5$ | WB <sub>5</sub> |

# **Build High-Performance Processors**



•••••

#### **Technique #1: Add More Functional Units**



#### **Technique #1: Add More Functional Units**



# **Technique #1: Add More Functional Units**



| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            |       |          |          |          |
| Fdiv            |       |          |          |          |

| 1: | FMUL<br>ADD | f1, | f2, | f3 |
|----|-------------|-----|-----|----|
| 2: | ADD         | r4, | r4, | r1 |

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            | Y     | f1       | f2       | f3       |
| Fdiv            |       |          |          |          |

| 1: | FMUL<br>ADD | f1, | f2, | f3 |
|----|-------------|-----|-----|----|
| 2: | ADD         | r4, | r4, | r1 |

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            | Y     | r4       | r4       | r1       |
| Fmul            | Υ     | f1       | f2       | f3       |
| Fdiv            |       |          |          |          |

| 1: | FMUL | f1, | f2, | f3 |
|----|------|-----|-----|----|
| 2: | ADD  | r4, | r4, | r1 |

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            |       |          |          |          |
| Fdiv            |       |          |          |          |

| 1: | FMUL | f1, | f2, | f3 |
|----|------|-----|-----|----|
| 2: | FDIV | f5, | f1, | f4 |

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            | Y     | f1       | f2       | f3       |
| Fdiv            |       |          |          |          |

|    | FMUL |     |     |    |
|----|------|-----|-----|----|
| 2: | FDIV | f5, | f1, | f4 |

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            | Υ     | f1       | f2       | f3       |
| Fdiv            | Y     | f5       | f1       | f4       |

| 1: | FMUL | f1, | f2, | f3 |
|----|------|-----|-----|----|
| 2: | FDIV | f5, | f1, | f4 |

Data Hazard!

| Functional Unit | Busy? | Dest Reg | Src1 Reg | Src2 Reg |
|-----------------|-------|----------|----------|----------|
| Int ALU         |       |          |          |          |
| Mem             |       |          |          |          |
| Fadd            |       |          |          |          |
| Fmul            |       |          |          |          |
| Fdiv            |       |          |          |          |

1: FMUL f1, f2, f3 ;10 cycles 2: FADD f1, f4, f5 ;4 cycles

# Technique #2: Scoreboard

# • Upon issue of an instruction, check:

- 1. Whether any ongoing instructions will generate values for my source registers
- 2. Whether any ongoing instructions will modify my destination register
- We call such a processor: in-order issue, out-of-order completion.

• A problem: how to handle interrupts/exceptions?

#### **Exception in OoO Processors: Example #1**



#### **Exception in OoO Processors: Example #2**



# **Technique #3: In-order Commit**



# Another Way to Draw It



#### **Re-examine Examples With In-order Commit**

```
1: LD r3, 0(r2) ; Exception in 3 cycles
2: ADD r4, r4, r1 ; 1 cycle
```

```
1: FMUL f1, f2, f3 ; 10 cycles
2: LD r3, 0(r2) ; Exception in 1 cycle
```

# **Recap: Page Mapping**



# **Mapping Kernel Pages**



# Jumping Between User and Kernel Space

- Key challenge: need to make sure we use the correct page table
  - CR3 (in x86) or satp (in RISCV) stores the page table physical address



#### **A Performance Optimization**

- Context switch overhead:
  - Page table changes, so in many processors, we need to flush TLB
- But sometimes, we only go to kernel to do some simple things
  E.g., getpid()
- The optimization: map kernel address into user space in a secure way

# Map Kernel Pages Into User Space



• Protection fault

#### **Meltdown**

- Put two optimizations together, we have Meltdown
  - Hardware optimization: out-of-order execution
  - Software optimization: mapping kernel addresses into user space
- Attack outcome: user space applications can read arbitrary kernel data

Ld1: uint8\_t secret = \*kernel\_address; Ld2: unit8 t dummy = probe array[secret\*64];



**ROB** head

2<sup>nd</sup> line of code can transiently execute before the execption occurs!

#### Meltdown w/ Flush+Reload

- 1. Setup: Attacker allocates probe\_array, with 256 cache lines. Flushes all its cache lines
- 2. Transmit: Attacker executes

```
.....
Ld1: uint8_t secret = *kernel_address;
Ld2: unit8_t dummy = probe_array[secret*64];
```

Receive: After handling protection fault, attacker performs cache side channel attack to figure out which line of probe\_array is accessed → recovers byte

# **Meltdown Mitigations**

- Stop one of the optimizations should be sufficient
  - SW: Do not let user and kernel share address space (KPTI) -> broken by several groups (e.g., *EntryBleed*)
  - HW: Stall speculation; Register poisoning

```
.....
Ld1: uint8_t secret = *kernel_address;
Ld2: unit8_t dummy = probe_array[secret*64];
```

- We generally consider Meltdown as a design bug
  - Similar "bugs" followed however

Will Liu, EntryBleed, https://www.willsroot.io/2022/12/entrybleed.html?m=1

# **Meltdown Followups**

- MDS-microarchitectural data sampling
  - RIDL
  - Cacheout
  - Zombieload
- Crosstalk
- Downfall
- Reptar
- LVI-load value injection



# **Spectre Variant 1 – Exploit Branch Condition**



Attack to read arbitrary memory:

1. Setup: Train branch predictor

2. Transmit: Trigger branch misprediction; *&array1[x]* maps to some desired kernel address

3. Receive: Attacker probes cache to infer which line of *array2* was fetched

- Most BTBs store partial tags and targets...
  - <last n bits of current PC, target PC>



Train BTB properly 

Execute arbitrary gadgets speculatively

#### **General Attack Schema**



#### **Apply the General Attack Scheme**



# **General Attack Schema**



- Transient attacks: can leak data-at-rest
  - Meltdown = transient execution + deferred exception handling
  - Spectre = transient execution on wrong paths

"Easy" to fix

Hard to fix



THE UNIVERSITY of NORTH CAROLINA at CHAPEL HILL