Intel SGX Explained - Max Fang's Notes

# Intel SGX Explained Victor Costan and Srinivas Devadas of [[MIT]], [[2016]] Links: [[SGX]] **[[2016-086 Intel SGX Explained.pdf]]** https://eprint.iacr.org/2016/086 https://eprint.iacr.org/2016/086.pdf ## Overview ![[Screen Shot 2022-02-18 at 5.05.43 PM.png]] SGX relies on software attestation, like its predecessors, the TPM [71] and TXT [70]. Attestation (Figure 3) proves to a user that she is communicating with a specific piece of software running in a secure container hosted by the trusted hardware. ==The proof is a **cryptographic signature that certifies the hash of the secure container’s contents**.== It follows that the remote computer’s owner can load any software in a secure container, but the remote computation service user will refuse to load her data into a secure container whose contents’ hash does not match the expected value. ![[Screen Shot 2022-02-18 at 5.06.21 PM.png]] - Why is there "Trusted Hardware"? Shouldn't it be untrusted? And only the secure container is trusted ==The remote computation service user verifies the **attestation key** used to produce the signature against an **endorsement certificate** created by the trusted hardware’s manufacturer==. The certificate states that the attestation key is only known to the trusted hardware, and only used for the purpose of attestation. ![[Screen Shot 2022-02-18 at 5.08.25 PM.png]] - How does the user obtain `AK`, the Attestation Key SGX stands out from its predecessors by the amount of code covered by the attestation, which is in the Trusted Computing Base (TCB) for the system using hardware protection. The attestations produced by the original TPM design covered all the software running on a com- puter, and TXT attestations covered the code inside a VMX [181] virtual machine. In SGX, an enclave (secure container) only contains the private data in a computation, and the code that operates on it. ### 1.1 SGX Lightning Tour SGX sets aside a memory region, called the P**rocessor Reserved Memory** (**PRM**, § 5.1). The CPU protects the PRM from all non-enclave memory accesses, including kernel, hypervisor and SMM (§ 2.3) accesses, and DMA accesses (§ 2.9.1) from peripherals. The PRM holds the **Enclave Page Cache** (**EPC**, § 5.1.1), which consists of **4 KB pages** that store enclave code and data. The system software, which is untrusted, is in charge of assigning EPC pages to enclaves. The CPU tracks each EPC page’s state in the Enclave Page Cache Metadata (EPCM, § 5.1.2), to ensure that each EPC page belongs to exactly one enclave. **Loading software** The initial code and data in an enclave is loaded by untrusted system software. **During the loading stage (§ 5.3), the system software asks the CPU to copy data from unprotected memory (outside PRM) into EPC pages, and assigns the pages to the enclave being setup** (§ 5.1.2). It follows that the initial enclave state is known to the system software. After all the enclave’s pages are loaded into EPC, the system software asks the CPU to mark the enclave as initialized (§ 5.3), at which point application software can run the code inside the enclave. After an enclave is initialized, the loading method described above is disabled. While an enclave is loaded, its contents is cryptograph- ically hashed by the CPU. When the enclave is initialized, the hash is finalized, and becomes the enclave’s **measurement hash** (§ 5.6). - **Measurement hash**: hash of contents of enclave after initialization **Attesting to the initial state** A remote party can undergo a software attestation process (§ 5.8) to convince itself that it is communicating with an enclave that has a specific measurement hash, and is running in a secure environment. **Entering the enclave** Execution flow can only enter an enclave via special CPU instructions (§ 5.4), which are similar to the mech- anism for switching from user mode to kernel mode. ==Enclave execution always happens in protected mode, at ring 3==, and uses the address translation set up by the OS kernel and hypervisor. - Ring 3 is Application mode (least trusted) - [Wiki on Protection rings](https://en.wikipedia.org/wiki/Protection_ring) ![[Screen Shot 2022-02-16 at 10.27.39 PM.png]] To avoid leaking private data, a CPU that is executing enclave code does not directly service an interrupt, fault (e.g., a page fault) or VM exit. Instead, the CPU first per- forms an **Asynchronous Enclave Exit** (§ 5.4.3) to switch from enclave code to ring 3 code, and then services the interrupt, fault, or VM exit. The CPU performs an AEX by saving the CPU state into a predefined area inside the enclave and transfers control to a pre-specified instruc- tion outside the enclave, replacing CPU registers with synthetic values. - Asynchronous Enclave Exit is also **AEX** The allocation of EPC pages to enclaves is delegated to the OS kernel (or hypervisor). The OS communicates its allocation decisions to the SGX implementation via special ring 0 CPU instructions (§ 5.3). The OS can also evict EPC pages into untrusted DRAM and later load them back, using dedicated CPU instructions. SGX uses cryptographic protections to assure the confidentiality, integrity and freshness of the evicted EPC pages while they are stored in untrusted memory. ### 1.2 Outline and Troubling Findings That being said, ==perhaps the most troubling finding in our security analysis is that Intel added a launch control feature to SGX that forces each computer’s owner to gain approval from a third party (which is currently Intel) for any enclave that the owner wishes to use on the computer==. § 5.9 explains that the only publicly documented intended use for this launch control feature is a licensing mechanism that requires software developers to enter a (yet unspecified) business agreement with Intel to be able to author software that takes advantage of SGX’s protec- tions. All the official documentation carefully sidesteps this issue, and has a minimal amount of hints that lead to the Intel’s patents on SGX. Only these patents disclose the existence of licensing plans. ==The licensing issue might not bear much relevance right now, because our security analysis reveals that the limitations in SGX’s guarantees mean that a security-conscious software developer cannot in good conscience rely on SGX for secure remote computation.== - Yikes, hopefully this has changed significantly since [[2016]] ## 2 Computer Architecture Background Analyzing the security of a software system requires understanding the interactions between all the parts of the software’s execution environment, so this section is quite long. We do refrain from introducing any security concepts here, so readers familiar with x86’s intricacies can safely skip this section and refer back to it when necessary. In this paper, the term Intel architecture refers to the x86 architecture described in Intel’s SDM. The x86 ar- chitecture is overly complex, mostly due to the need to support executing legacy software dating back to 1990 directly on the CPU, without the overhead of software interpretation. We only cover the parts of the architecture visible to modern 64-bit software, also in the interest of space and mental sanity. The 64-bit version of the x86 architecture, covered in this section, was actually invented by Advanced Micro Devices (AMD), and is also known as AMD64, `x86_64`, and x64. The term “Intel architecture” highlights our interest in the architecture’s implementation in Intel’s chips, and our desire to understand the mindsets of Intel SGX’s designers. ### 2.1 Overview A computer’s main resources (§ 2.2) are memory and processors. On Intel computers, Dynamic Random- Access Memory (DRAM) chips (§ 2.9.1) provide the memory, and one or more CPU chips expose logical processors (§ 2.9.4). These resources are managed by system software. An Intel computer typically runs two kinds of system software, namely operating systems and hypervisors. Server computers, especially in cloud environments, may run multiple operating system instances at the same time. This is accomplished by having a hypervisor (§ 2.3) partition the computer’s re- sources between the operating system instances running on the computer. Modern operating systems implement preemptive multithreading, where the logical processors are rotated between all the threads on a system every few milliseconds. Changing the thread assigned to a logical processor is accomplished by an execution context switch (§ 2.6). Hypervisors expose a fixed number of virtual proces- sors (vCPUs) to each operating system, and also use context switching to multiplex the logical CPUs on a computer between the vCPUs presented to the guest op- erating systems. The execution core in a logical processor can execute instructions and consume data at a much faster rate than DRAM can supply them. Many of the complexities in modern computer architectures stem from the need to cover this speed gap. Recent Intel CPUs rely on hyper- threading (§ 2.9.4), out-of-order execution (§ 2.10), and caching (§ 2.11), all of which have security implications. An Intel processor contains many levels of interme- diate memories that are much faster than DRAM, but also orders of magnitude smaller. The fastest intermediate memory is the logical processor’s register file (§ 2.2, § 2.4, § 2.6). The other intermediate memories are called caches (§ 2.11). The Intel architecture requires applica- tion software to explicitly manage the register file, which serves as a high-speed scratch space. At the same time, caches transparently accelerate DRAM requests, and are mostly invisible to software. Intel computers have multiple logical processors. As a consequence, they also have multiple caches dis- tributed across the CPU chip. On multi-socket systems, the caches are distributed across multiple CPU chips. Therefore, Intel systems use a cache coherence mech- anism (§ 2.11.3), ensuring that all the caches have the same view of DRAM. Thanks to cache coherence, pro- grammers can build software that is unaware of caching, and still runs correctly in the presence of distributed caches. However, cache coherence does not cover the dedicated caches used by address translation (§ 2.11.5), and system software must take special measures to keep these caches consistent. CPUs communicate with the outside world via I/O devices (also known as peripherals), such as network interface cards and display adapters (§ 2.9). Conceptu- ally, the CPU communicates with the DRAM chips and the I/O devices via a system bus that connects all these components. Software written for the Intel architecture communi- cates with I/O devices via the I/O address space (§ 2.4) and via the memory address space, which is primarily used to access DRAM. System software must configure the CPU’s caches (§ 2.11.4) to recognize the memory address ranges used by I/O devices. Devices can notify the CPU of the occurrence of events by dispatching in- terrupts (§ 2.12), which cause a logical processor to stop executing its current thread, and invoke a special handler in the system software (§ 2.8.2). Intel systems have a highly complex computer initial- ization sequence (§ 2.13), due to the need to support a large variety of peripherals, as well as a multitude of operating systems targeting different versions of the ar- chitecture. The initialization sequence is a challenge to any attempt to secure an Intel computer, and has facili- tated many security compromises (§ 2.3). Intel’s engineers use the processor’s microcode facil- ity (§ 2.14) to implement the more complicated aspects of the Intel architecture, which greatly helps manage the hardware’s complexity. The microcode is completely invisible to software developers, and its design is mostly undocumented. However, in order to evaluate the feasibility of any architectural change proposals, one must be able to distinguish changes that can be implemented in microcode from changes that can only be accomplished by modifying the hardware. ### 2.2 Computational Model This section pieces together a highly simplified model for a computer that implements the Intel architecture, illustrated in Figure 4. This simplified model is intended to help the reader’s intuition process the fundamental concepts used by the rest of the paper. The following sec- tions gradually refine the simplified model into a detailed description of the Intel architecture. ![[Screen Shot 2022-02-18 at 5.54.45 PM.png]] The memory is an array of storage cells, addressed using natural numbers starting from 0, and implements the abstraction depicted in Figure 5. Its salient feature is that the result of reading a memory cell at an address must equal the most recent value written to that memory cell. A logical processor repeatedly reads instructions from the computer’s memory and executes them, according to the flowchart in Figure 6. The processor has an internal memory, referred to as the register file. The register file consists of Static Random Access Memory (SRAM) cells, generally known as registers, which are significantly faster than DRAM cells, but also a lot more expensive. An instruction performs a simple computation on its inputs and stores the result in an output location. The processor’s registers make up an execution context that provides the inputs and stores the outputs for most in- structions. For example, `ADD RDX, RAX, RBX` performs an integer addition, where the inputs are the regis- ters `RAX` and `RBX`, and the result is stored in the output register `RDX`. The registers mentioned in Figure 6 are the instruction pointer (RIP), which stores the memory address of the next instruction to be executed by the processor, and the stack pointer (RSP), which stores the memory address of the topmost element in the call stack used by the processor’s procedural programming support. The other execution context registers are described in § 2.4 and § 2.6. Under normal circumstances, the processor repeatedly reads an instruction from the memory address stored in RIP, executes the instruction, and updates RIP to point to the following instruction. Unlike many RISC architec- tures, the Intel architecture uses a variable-size instruc- tion encoding, so the size of an instruction is not known until the instruction has been read from memory. While executing an instruction, the processor may encounter a fault, which is a situation where the instruc- tion’s preconditions are not met. When a fault occurs, the instruction does not store a result in the output loca- tion. Instead, the instruction’s result is considered to be the fault that occurred. For example, an integer division instruction DIV where the divisor is zero results in a Division Fault (#DIV). When an instruction results in a fault, the processor stops its normal execution flow, and performs the fault handler process documented in § 2.8.2. In a nutshell, the processor first looks up the address of the code that will handle the fault, based on the fault’s nature, and sets up the execution environment in preparation to execute the fault handler. ![[Screen Shot 2022-02-18 at 6.01.44 PM.png]] **System bus** The processors are connected to each other and to the memory via a system bus, which is a broadcast network that implements the abstraction in Figure 7. ![[Screen Shot 2022-02-18 at 6.02.41 PM.png]] During each clock cycle, at most one of the devices connected to the system bus can send a message, which is received by all the other devices connected to the bus. Each device attached to the bus decodes the operation codes and addresses of all the messages sent on the bus and ignores the messages that do not require its involvement. For example, when the processor wishes to read a memory location, it sends a message with the operation code READ-REQUEST and the bus address corresponding to the desired memory location. The memory sees the message on the bus and performs the R E A D operation. At a later time, the memory responds by sending a mes- sage with the operation code READ-RESPONSE, the same address as the request, and the data value set to the result of the READ operation. The computer communicates with the outside world via I/O devices, such as keyboards, displays, and net- work cards, which are connected to the system bus. De- vices mostly respond to requests issued by the processor. However, devices also have the ability to issue interrupt requests that notify the processor of outside events, such as the user pressing a key on a keyboard. Interrupt triggering is discussed in § 2.12. On modern systems, devices send interrupt requests by issuing writes to special bus addresses. Interrupts are considered to be hardware exceptions, just like faults, and are handled in a similar manner. ### 2.3 Software Privilege Levels In an Infrastructure-as-a-Service (IaaS) cloud environ- ment, such as Amazon EC2, commodity CPUs run soft- ware at four different privilege levels, shown in Figure 8. ![[Screen Shot 2022-02-18 at 6.18.46 PM.png]] Each privilege level is strictly more powerful than the ones below it, so a piece of software can freely read and modify the code and data running at less privileged levels. Therefore, a software module can be compromised by any piece of software running at a higher privilege level. It follows that a software module implicitly trusts all the software running at more privileged levels, and a system’s security analysis must take into account the software at all privilege levels. **System Management Mode (SMM)** is intended for use by the motherboard manufacturers to implement features such as fan control and deep sleep, and/or to emulate missing hardware. Therefore, the bootstrapping software (§ 2.13) in the computer’s firmware is responsible for setting up a continuous subset of DRAM as System Man- agement RAM (SMRAM), and for loading all the code that needs to run in SMM mode into SMRAM. The SM- RAM enjoys special hardware protections that prevent less privileged software from accessing the SMM code. IaaS cloud providers allow their customers to run their operating system of choice in a virtualized environment. Hardware virtualization [181], called **Virtual Machine Extensions (VMX)** by Intel, adds support for a **hypervisor**, also called a **Virtual Machine Monitor (VMM)** in the Intel documentation. The hypervisor runs at a higher privilege level (VMX root mode) than the operating sys- tem, and is responsible for allocating hardware resources across multiple operating systems that share the same physical machine. The hypervisor uses the CPU’s hardware virtualization features to make each operating system believe it is running in its own computer, called a virtual machine (VM). Hypervisor code generally runs at ring 0 in VMX root mode. Hypervisors that run in VMX root mode and take ad- vantage of hardware virtualization generally have better performance and a smaller codebase than hypervisors based on binary translation [161]. The systems research literature recommends breaking up an operating system into a small kernel, which runs at a high privilege level, known as the kernel mode or supervisor mode and, in the Intel architecture, as ring 0. The kernel allocates the computer’s resources to the other system components, such as device drivers and services, which run at lower privilege levels. However, for per- formance reasons1, mainstream operating systems have large amounts of code running at ring 0. Their monolithic kernels include device drivers, filesystem code, network- ing stacks, and video rendering functionality. Application code, such as a Web server or a game client, runs at the lowest privilege level, referred to as user mode (ring 3 in the Intel architecture). In IaaS cloud environments, the virtual machine images provided by customers run in VMX non-root mode, so the kernel runs in VMX non-root ring 0, and the application code runs in VMX non-root ring 3. ### 2.4 Address Spaces Software written for the Intel architecture accesses the computer’s resources using four distinct physical address spaces, shown in Figure 9. The address spaces overlap partially, in both purpose and contents, which can lead to confusion. This section gives a high-level overview of the physical address spaces defined by the Intel architecture, with an emphasis on their purpose and the methods used to manage them. ![[Screen Shot 2022-02-18 at 6.25.25 PM.png]] The register space consists of names that are used to access the CPU’s register file, which is the only memory that operates at the CPU’s clock frequency and can be used without any latency penalty. The register space is defined by the CPU’s architecture, and documented in the SDM. Some registers, such as the Control Registers (CRs) play specific roles in configuring the CPU’s operation. For example, CR3 plays a central role in address trans- lation (§ 2.5). These registers can only be accessed by system software. The rest of the registers make up an application’s **execution context** (§ 2.6), which is essen- tially a high-speed scratch space. These registers can be accessed at all privilege levels, and their allocation is managed by the software’s compiler. Many CPU instruc- tions only operate on data in registers, and only place their results in registers. The **memory** space, generally referred to as the **address space**, or the **physical address space**, consists of 236 (64 GB) - 240 (1 TB) addresses. The memory space is primarily used to access DRAM, but it is also used to communicate with **memory-mapped devices** that read memory requests off a system bus and write replies for the CPU. Some CPU instructions can read their inputs from the memory space, or store the results using the memory space. - See: [wiki page for Memory map](https://en.wikipedia.org/wiki/Memory_map) A better-known example of memory mapping is that at computer startup, memory addresses 0xFFFF0000 - 0xFFFFFFFF (the 64 KB of memory right below the 4 GB mark) are mapped to a flash memory device that holds the first stage of the code that bootstraps the com- puter. The memory space is partitioned between devices and DRAM by the computer’s firmware during the bootstrap- ping process. Sometimes, system software includes motherboard-specific code that modifies the memory space partitioning. The OS kernel relies on address trans- lation, described in § 2.5, to control the applications’ access to the memory space. The hypervisor relies on the same mechanism to control the guest OSs. The **input/output (I/O)** space consists of $2^{16}$ I/O addresses, usually called ports. The I/O ports are used exclusively to communicate with devices. The CPU pro- vides specific instructions for reading from and writing to the I/O space. I/O ports are allocated to devices by formal or de-facto standards. For example, ports 0xCF8 and 0xCFC are always used to access the PCI express (§ 2.9.1) configuration space. The CPU implements a mechanism for system soft- ware to provide fine-grained I/O access to applications. However, all modern kernels restrict application software from accessing the I/O space directly, in order to limit the damage potential of application bugs. The **Model-Specific Register (MSR)** space consists of 232 MSRs, which are used to configure the CPU’s op- eration. The MSR space was initially intended for the use of CPU model-specific firmware, but some MSRs have been promoted to **architectural MSR** status, making their semantics a part of the Intel architecture. For example, architectural MSR 0x10 holds a high-resolution monotonically increasing time-stamp counter. The CPU provides instructions for reading from and writing to the MSR space. The instructions can only be used by system software. Some MSRs are also exposed by instructions accessible to applications. For example, applications can read the time-stamp counter via the RDTSC and RDTSCP instructions, which are very useful for benchmarking and optimizing software. ### 2.5 Address Translation System software relies on the CPU’s address transla- tion mechanism for implementing isolation among less privileged pieces of software (applications or operating systems). Virtually all secure architecture designs bring changes to address translation. We summarize the Intel architecture’s address translation features that are most relevant when establishing a system’s security properties, and refer the reader to [108] for a more general presentation of address translation concepts and its other uses. #### 2.5.1 Address Translation Concepts From a systems perspective, address translation is a layer of indirection (shown in Figure 10) between the virtual addresses, which are used by a program’s memory load and store instructions, and the physical addresses, which reference the physical address space (§ 2.4). The map- ping between virtual and physical addresses is defined by page tables, which are managed by the system software. ![[Screen Shot 2022-02-18 at 7.36.32 PM.png]] Operating systems use address translation to imple- ment the virtual memory abstraction, illustrated by Fig- ure 11. The virtual memory abstraction exposes the same interface as the memory abstraction in § 2.2, but each process uses a separate virtual address space that only references the memory allocated to that process. From an application developer standpoint, virtual memory can be modeled by pretending that each process runs on a separate computer and has its own DRAM. ![[Screen Shot 2022-02-18 at 7.37.49 PM.png]] Address translation is used by the operating system to multiplex DRAM among multiple application processes, isolate the processes from each other, and prevent ap- plication code from accessing memory-mapped devices directly. The latter two protection measures prevent an application’s bugs from impacting other applications or the OS kernel itself. Hypervisors also use address trans- lation, to divide the DRAM among operating systems that run concurrently, and to virtualize memory-mapped devices. The address translation mode used by 64-bit operating systems, called IA-32e by Intel’s documentation, maps **48-bit** **virtual addresses** to **physical addresses** of at most **52 bits** $^2$. The translation process, illustrated in Figure 12, is carried out by dedicated hardware in the CPU, which is referred to as the address translation unit or the **memory management unit (MMU)** - $^2$ The size of a physical address is CPU-dependent, and is 40 bits for recent desktop CPUs and 44 bits for recent high-end server CPUs. ![[Screen Shot 2022-02-18 at 7.43.43 PM.png]] The bottom 12 bits of a virtual address are not changed by the translation. The top 36 bits are grouped into four 9-bit indexes, which are used to index into the page tables. **Despite its name, the page tables data structure closely resembles a full 512-ary search tree where nodes have fixed keys**. Each node is represented in DRAM as an array of 512 8-byte entries that contain the physical addresses of the next-level children as well as some flags. The physical address of the root node is stored in the CR3 register. The arrays in the last-level nodes contain the physical addresses that are the result of the address translation. The address translation function, which does not change the bottom bits of addresses, partitions the mem- ory address space into pages. A page is the set of all memory locations that only differ in the bottom bits which are not impacted by address translation, so all the memory addresses in a virtual page translate to corre- sponding addresses in the same physical page. From this perspective, the address translation function can be seen as a mapping between **Virtual Page Numbers (VPN)** and **Physical Page Numbers (PPN)**, as shown in Figure 13. In addition to isolating application processes, operat- ing systems also use the address translation feature to run applications whose collective memory demands exceed the amount of DRAM installed in the computer. The OS evicts infrequently used memory pages from DRAM to a larger (but slower) memory, such as a hard disk drive (HDD) or solid-state drive (SSD). For historical reason, this slower memory is referred to as the disk. The OS ability to over-commit DRAM is often called page **swapping**, for the following reason. When an ap- plication process attempts to access a page that has been evicted, the OS “steps in” and reads the missing page back into DRAM. In order to do this, the OS might have to evict a different page from DRAM, effectively swap- ping the contents of a DRAM page with a disk page. The details behind this high-level description are covered in the following sections. The CPU’s address translation is also referred to as “**paging**”, which is a shorthand for “**page swapping**”. # Excerpts The **Platform Controller Hub (PCH)** houses (relatively) low-speed I/O controllers driving the slower buses in the system, like SATA, used by storage devices, and USB, used by input peripherals. The PCH is also known as the **chipset**. At a first approximation, the **south bridge** term in older documentation can also be considered as a synonym for PCH. #### The Intel Management Engine (ME) Intel’s Management Engine (ME) is an embedded com- puter that was initially designed for remote system man- agement and troubleshooting of server-class systems that are often hosted in data centers. However, all of Intel’s recent PCHs contain an ME [80], and it currently plays a crucial role in platform bootstrapping, which is described in detail in § 2.13. Most of the information in this section is obtained from an Intel-sponsored book [162]. The ME is part of Intel’s Active Management Tech- nology (AMT), which is marketed as a convenient way for IT administrators to troubleshoot and fix situations such as failing hardware, or a corrupted OS installation, without having to gain physical access to the impacted computer. ==The Intel ME, shown in Figure 21, remains functional during most hardware failures because it is an entire embedded computer featuring its own execution core, bootstrap ROM, and internal RAM. The ME can be used for troubleshooting effectively thanks to an array of abilities that include overriding the CPU’s boot vector and a DMA engine that can access the computer’s DRAM. The ME provides remote access to the computer without any CPU support because it can use the System Management bus (SMBus) to access the motherboard’s Ethernet PHY or an AMT-compatible NIC [100].== ==The Intel ME is connected to the motherboard’s power supply using a power rail that stays active even when the host computer is in the Soft Off mode [100], known as ACPI G2/S5, where most of the computer’s components are powered off [87], including the CPU and DRAM. For all practical purposes, this means that the ME’s exe- cution core is active as long as the power supply is still connected to a power source.== - Yikes In S5, the ME cannot access the DRAM, but it can still use its own internal memories. The ME can also still communicate with a remote party, as it can access the motherboard’s Ethernet PHY via SMBus. This enables applications such as AMT’s theft prevention, where a laptop equipped with a cellular modem can be tracked and permanently disabled as long as it has power and signal. As the ME remains active in deep power-saving modes, its design must rely on low-power components. The exe- cution core is an Argonaut RISC Core (ARC) clocked at 200-400MHz, which is typically used in low-power em- bedded designs. On a very recent PCH [100], the internal SRAM has 640KB, and is shared with the Integrated Sen- sor Hub (ISH)’s core. The SMBus runs at 1MHz and, without CPU support, the motherboard’s Ethernet PHY runs at 10Mpbs. #### 2.9.3 The Processor Die ![[Screen Shot 2022-02-18 at 11.50.59 PM.png]] Security extensions to the Intel architecture, such as **Trusted Execution Technology (TXT)** [70] and **Software Guard Extensions (SGX)** [14, 139], rely on the fact that the processor die includes the memory and I/O controller, and thus can prevent any device from accessing protected memory areas via **Direct Memory Access (DMA)** transfers. § 2.11.3 takes a deeper look at the uncore organization and at the machinery used to prevent unauthorized DMA transfers. #### 2.9.4 The Core Most Intel CPUs feature **hyper-threading**, which means that a core (shown in Figure 23) has two copies of the register files backing the execution context de- scribed in § 2.6, and can execute two separate streams of instructions simultaneously. Hyper-threading reduces the impact of memory stalls on the utilization of the fetch, decode and execution units. ![[Screen Shot 2022-02-18 at 11.51.11 PM.png]] A hyper-threaded core is exposed to system software as two **logical processors (LPs)**, also named **hardware threads** in the Intel documentation. The logical processor abstraction allows the code used to distribute work across processors in a multi-processor system to func- tion without any change on multi-core hyper-threaded processors. The high level of resource sharing introduced by hyper-threading introduces a security vulnerability. Soft- ware running on one logical processor can use the high- resolution performance counter (`RDTSCP`, § 2.4) [152] to get information about the instructions and memory ac- cess patterns of another piece of software that is executed on the other logical processor on the same core. That being said, the biggest downside of hyper-threading might be the fact that writing about Intel processors in a rigorous manner requires the use of the cumbersome term Logical Processor instead of the shorter and more intuitive “CPU core”, which can often be abbreviated to “core”. ### 2.10 Out-of-Order and Speculative Execution CPU cores can execute instructions orders of magni- tude faster than DRAM can read data. Computer architects attempt to bridge this gap by using hyper-threading (§ 2.9.3), out-of-order and speculative execution, and caching, which is described in § 2.11. In CPUs that use out-of-order execution, the order in which the CPU carries out a program’s instructions (**execution order**) is not necessarily the same as the order in which the instructions would be executed by a sequential evaluation system (**program order**). An analysis of a system’s information leakage must take out-of-order execution into consideration. Any CPU actions observed by an attacker match the execution order, so the attacker may learn some information by comparing the observed execution order with a known program order. At the same time, attacks that try to infer a victim’s program order based on actions taken by the CPU must account for out-of-order execution as a source of noise. #### 2.10.1 Out-of-Order Execution ![[Screen Shot 2022-02-19 at 12.04.30 AM.png]] - It's just mind-bendingly efficient and complex. Great explanation of out-of-order execution ### 2.11 Cache Memories At the time of this writing, CPU cores can process data ≈ 200× faster than DRAM can supply it. This gap is bridged by an hierarchy of cache memories, which are orders of magnitude smaller and an order of magnitude faster than DRAM. While caching is transparent to ap- plication software, the system software is responsible for managing and coordinating the caches that store address translation (§ 2.5) results. Caches impact the security of a software system in two ways. First, the Intel architecture relies on system software to manage address translation caches, which becomes an issue in a threat model where the system soft- ware is untrusted. Second, caches in the Intel architecture are shared by all the software running on the computer. This opens up the way for cache timing attacks, an entire class of software attacks that rely on observing the time differences between accessing a cached memory location and an uncached memory location. This section summarizes the caching concepts and im- plementation details needed to reason about both classes of security problems mentioned above. [170], [150] and [76] provide a good background on low-level cache im- plementation concepts. § 3.8 describes cache timing attacks. #### 2.11.1 Caching Principles At a high level, caches exploit the high locality in the memory access patterns of most applications to hide the main memory’s (relatively) high latency. By **caching** (storing a copy of) the most recently accessed code and data, these relatively small memories can be used to satisfy 90%-99% of an application’s memory accesses. In an Intel processor, the first-level (L1) cache consists of a separate data cache (D-cache) and an instruction cache (I-cache). The instruction fetch and decode stage is directly connected to the L1 I-cache, and uses it to read the streams of instructions for the core’s logical proces- sors. Micro-ops that read from or write to memory are executed by the memory unit (MEM in Figure 23), which is connected to the L1 D-cache and forwards memory accesses to it. Figure 25 illustrates the steps taken by a cache when it receives a memory access. First, a **cache lookup** uses the memory address to determine if the corresponding data exists in the cache. A **cache hit** occurs when the address is found, and the cache can resolve the memory access quickly. Conversely, if the address is not found, a **cache miss** occurs, and a **cache fill** is required to resolve the memory access. When doing a fill, the cache forwards the memory access to the next level of the memory hierarchy and caches the response. Under most circumstances, a cache fill also triggers a **cache eviction**, in which some data is removed from the cache to make room for the data coming from the fill. If the data that is evicted has been modified since it was loaded in the cache, it must be **written back** to the next level of the memory hierarchy. ![[Screen Shot 2022-02-19 at 12.20.21 AM.png]] Table 8 shows the key characteristics of the memory hierarchy implemented by modern Intel CPUs. Each core has its own L1 and L2 cache (see Figure 23), while the L3 cache is in the CPU’s uncore (see Figure 22), and is shared by all the cores in the package. The numbers in Table 8 suggest that cache placement can have a large impact on an application’s execution time. Because of this, the Intel architecture includes an assortment of instructions that give performance- sensitive applications some control over the caching of their working sets. PREFETCH instructs the CPU’s prefetcher to cache a specific memory address, in preparation for a future memory access. The memory writes performed by the MOVNT instruction family bypass the cache if a fill would be required. CLFLUSH evicts any cache lines storing a specific address from the entire cache hierarchy. The methods mentioned above are available to soft- ware running at all privilege levels, because they were de- signed for high-performance workloads with large work- ing sets, which are usually executed at ring 3 (§ 2.3). For comparison, the instructions used by system software to manage the address translation caches, described in § 2.11.5 below, can only be executed at ring 0. ![[Screen Shot 2022-02-19 at 12.22.34 AM.png]] #### 2.11.2 Cache Organization ![[Screen Shot 2022-02-19 at 12.26.50 AM.png]] - Again, ridiculously complex, but really cool #### 2.11.3 Cache Coherence The Intel architecture was designed to support applica- tion software that was not written with caches in mind. One aspect of this support is the Total Store Order (TSO) [147] memory model, which promises that all the logical processors in a computer see the same order of DRAM writes. The same memory location might be simultaneously cached by different cores’ caches, or even by caches on separate chips, so providing the TSO guarantees requires a **cache coherence protocol** that synchronizes all the cache lines in a computer that reference the same memory address. The cache coherence mechanism is not visible to software, so it is only briefly mentioned in the SDM. Fortunately, Intel’s optimization reference [96] and the datasheets referenced in § 2.9.3 provide more informa- tion. Intel processors use variations of the MESIF [66] protocol, which is implemented in the CPU and in the protocol layer of the QPI bus. The SDM and the `CPUID` instruction output indicate that the **L3 cache**, also known as the **last-level cache (LLC)** is **inclusive**, meaning that any location cached by an L1 or L2 cache must also be cached in the LLC. This design decision reduces complexity in many implementation aspects. We estimate that the bulk of the cache coherence implementation is in the CPU’s uncore, thanks to the fact that cache synchronization can be achieved without having to communicate to the lower cache levels that are inside execution cores. The QPI protocol defines cache agents, which are connected to the last-level cache in a processor, and home agents, which are connected to memory controllers. Cache agents make requests to home agents for cache line data on cache misses, while home agents keep track of cache line ownership, and obtain the cache line data from other cache line agents, or from the memory con- troller. The QPI routing layer supports multiple agents per socket, and each processor has its own caching agents, and at least one home agent. Figure 27 shows that the CPU uncore has a bidirec- tional ring interconnect, which is used for communi- cation between execution cores and the other uncore components. The execution cores are connected to the ring by CBoxes, which route their LLC accesses. The routing is static, as the LLC is divided into same-size slices (common slice sizes are 1.5 MB and 2.5 MB), and an undocumented hashing scheme maps each possible physical address to exactly one LLC slice. ![[Screen Shot 2022-02-19 at 12.41.38 AM.png]] Intel’s documentation states that the hashing scheme mapping physical addresses to LLC slices was designed to avoid having a slice become a hotspot, but stops short of providing any technical details. Fortunately, inde- pendent researches have reversed-engineered the hash functions for recent processors [85, 135, 197]. The hashing scheme described above is the reason why the L3 cache is documented as having a “complex” indexing scheme, as opposed to the direct indexing used in the L1 and L2 caches. The number of LLC slices matches the number of cores in the CPU, and each LLC slice shares a CBox with a core. The CBoxes implement the cache coherence engine, so each CBox acts as the QPI cache agent for its LLC slice. CBoxes use a **Source Address Decoder (SAD)** to route DRAM requests to the appropriate home agents. Conceptually, the SAD takes in a memory address and access type, and outputs a transaction type (coherent, non-coherent, IO) and a node ID. Each CBox contains a SAD replica, and the configurations of all SADs in a package are identical. The SAD configurations are kept in sync by the **UBox**, which is the uncore configuration controller, and connects the **System agent** to the ring. The UBox is responsible for reading and writing physically distributed registers across the uncore. The UBox also receives inter- rupts from system and dispatches them to the appropriate core. On recent Intel processors, the uncore also contains at least one memory controller. Each integrated memory controller (iMC or MBox in Intel’s documentation) is connected to the ring by a **home agent** (HA or **BBox** in Intel’s datasheets). Each home agent contains a **Target Address Decoder (TAD)**, which maps each DRAM address to an address suitable for use by the DRAM chips, namely a DRAM channel, bank, rank, and a DIMM address. The mapping in the TAD is not documented by Intel, but it has been reverse-engineered [151]. The integration of the memory controller on the CPU brings the ability to filter DMA transfers. Accesses from a peripheral connected to the PCIe bus are handled by the integrated I/O controller (IIO), placed on the ring inter- connect via the UBox, and then reach the iMC. Therefore, on modern systems, DMA transfers go through both the SAD and TAD, which can be configured to abort DMA transfers targeting protected DRAM ranges. #### 2.11.4 Caching and Memory-Mapped Devices Caches rely on the assumption that the underlying mem- ory implements the memory abstraction in § 2.2. How- ever, the physical addresses that map to memory-mapped I/O devices usually deviate from the memory abstraction. For example, some devices expose command registers that trigger certain operations when written, and always return a zero value. Caching addresses that map to such memory-mapped I/O devices will lead to incorrect behavior. Furthermore, even when the memory-mapped devices follow the memory abstraction, caching their memory is sometimes undesirable. For example, caching a graphic unit’s framebuffer could lead to visual artifacts on the user’s display, because of the delay between the time when a write is issued and the time when the correspond- ing cache lines are evicted and written back to memory. In order to work around these problems, the Intel archi- tecture implements a few caching behaviors, described below, and provides a method for partitioning the mem- ory address space (§ 2.4) into regions, and for assigning a desired caching behavior to each region. **Uncacheable (UC)** memory has the same semantics as the I/O address space (§ 2.4). UC memory is useful when a device’s behavior is dependent on the order of memory reads and writes, such as in the case of memory-mapped command and data registers for a PCIe NIC (§ 2.9.1). The out-of-order execution engine (§ 2.10) does not reorder UC memory accesses, and does not issue speculative reads to UC memory. **Write Combining (WC)** memory addresses the spe- cific needs of framebuffers. WC memory is similar to UC memory, but the out-of-order engine may reorder memory accesses, and may perform speculative reads. The processor stores writes to WC memory in a write combining buffer, and attempts to group multiple writes into a (more efficient) line write bus transaction. **Write Through (WT)** memory is cached, but write misses do not cause cache fills. This is useful for pre- venting large memory-mapped device memories that are rarely read, such as framebuffers, from taking up cache memory. WT memory is covered by the cache coherence engine, may receive speculative reads, and is subject to operation reordering. DRAM is represented as **Write Back (WB)** memory, which is optimized under the assumption that all the devices that need to observe the memory operations im- plement the cache coherence protocol. WB memory is cached as described in § 2.11, receives speculative reads, and operations targeting it are subject to reordering. **Write Protected (WP)** memory is similar to WB memory, with the exception that every write is propagated to the system bus. It is intended for memory-mapped buffers, where the order of operations does not matter, but the devices that need to observe the writes do not im- plement the cache coherence protocol, in order to reduce hardware costs. On recent Intel processors, the cache’s behavior is mainly configured by the **Memory Type Range Registers (MTRRs)** and by **Page Attribute Table (PAT)** indices in the page tables (§ 2.5). The behavior is also impacted by the Cache Disable (CD) and Not-Write through (NW) bits in Control Register 0 (CR0, § 2.4), as well as by equivalent bits in page table entries, namely Page-level Cache Disable (PCD) and Page-level Write-Through (PWT). The MTRRs were intended to be configured by the computer’s firmware during the boot sequence. Fixed MTRRs cover pre-determined ranges of memory, such as the memory areas that had special semantics in the computers using 16-bit Intel processors. The ranges covered by **variable MTRRs** can be configured by system software. The representation used to specify the ranges is described below, as it has some interesting properties that have proven useful in other systems. Each variable memory type range is specified using a **range base** and a **range mask**. A memory address belongs to the range if computing a bitwise AND between the address and the range mask results in the range base. This verification has a low-cost hardware implementation, shown in Figure 28. ![[Screen Shot 2022-02-19 at 12.57.43 AM.png]] Each variable memory type range must have a size that is an integral power of two, and a starting address that is a multiple of its size, so it can be described using the base / mask representation described above. A range’s starting address is its base, and the range’s size is one plus its mask. Another advantage of this range representation is that the base and the mask can be easily validated, as shown in Listing 1. The range is aligned with respect to its size if and only if the bitwise AND between the base and the mask is zero. The range’s size is a power of two if and only if the bitwise AND between the mask and one plus the mask is zero. According to the SDM, the MTRRs are not validated, but setting them to invalid values results in undefined behavior. ![[Screen Shot 2022-02-19 at 12.59.03 AM.png]] No memory type range can partially cover a 4 KB page, which implies that the range base must be a multiple of 4 KB, and the bottom 12 bits of range mask must be set. This simplifies the interactions between memory type ranges and address translation, described in § 2.11.5. The PAT is intended to allow the operating system or hypervisor to tweak the caching behaviors specified in the MTRRs by the computer’s firmware. The PAT has 8 entries that specify caching behaviors, and is stored in its entirety in a MSR. Each page table entry contains a 3-bit index that points to a PAT entry, so the system software that controls the page tables can specify caching behavior at a very fine granularity. #### 2.11.5 Caches and Address Translation Modern system software relies on address translation (§ 2.5). This means that all the memory accesses issued by a CPU core use virtual addresses, which must undergo translation. Caches must know the physical address for a memory access, to handle aliasing (multiple virtual ad- dresses pointing to the same physical address) correctly. ==However, address translation requires up to 20 memory accesses== (see Figure 15), so it is impractical to perform a full address translation for every cache access. Instead, address translation results are cached in the **translation look-aside buffer (TLB)**. Table 9 shows the levels of the TLB hierarchy. Recent processors have separate L1 TLBs for instructions and data, and a shared L2 TLB. Each core has its own TLBs (see Figure 23). When a virtual address is not contained in a core’s TLB, the **Page Miss Handler (PMH)** performs a **page walk** (page table / EPT traversal) to translate the virtual address, and the result is stored in the TLB. ![[Screen Shot 2022-02-19 at 1.01.17 AM.png]] ![[Screen Shot 2022-02-19 at 1.01.38 AM.png]] In the Intel architecture, the PMH is implemented in hardware, so the TLB is never directly exposed to soft- ware and its implementation details are not documented. The SDM does state that each TLB entry contains the physical address associated with a virtual address, and the metadata needed to resolve a memory access. For example, the processor needs to check the writable (W) flag on every write, and issue a General Protection fault (#GP) if the write targets a read-only page. Therefore, the TLB entry for each virtual address caches the logical- and of all the relevant W flags in the page table structures leading up to the page. The TLB is transparent to application software. How- ever, kernels and hypervisors must make sure that the TLBs do not get out of sync with the page tables and EPTs. When changing a page table or EPT, the system software must use the INVLPG instruction to invalidate any TLB entries for the virtual address whose translation changed. Some instructions **flush the TLBs**, meaning that they invalidate all the TLB entries, as a side-effect. TLB entries also cache the desired caching behavior (§ 2.11.4) for their pages. This requires system software to flush the corresponding TLB entries when changing MTRRs or page table entries. In return, the processor only needs to compute the desired caching behavior dur- ing a TLB miss, as opposed to computing the caching behavior on every memory access. The TLB is not covered by the cache coherence mech- anism described in § 2.11.3. Therefore, when modifying a page table or EPT on a multi-core / multi-processor system, the system software is responsible for perform- ing a TLB shootdown, which consists of stopping all the logical processors that use the page table / EPT about to be changed, performing the changes, executing TLB- invalidating instructions on the stopped logical proces- sors, and then resuming execution on the stopped logical processors. ### 2.12 Interrupts - Still ridiculously complex... Peripherals use **interrupts** to signal the occurrence of an event that must be handled by system software. For example, a keyboard triggers interrupts when a key is pressed or depressed. System software also relies on interrupts to implement preemptive multi-threading. Interrupts are a kind of hardware exception (§ 2.8.2). Receiving an interrupt causes an execution core to per- form a privilege level switch and to start executing the system software’s interrupt handling code. Therefore, the security concerns in § 2.8.2 also apply to interrupts, with the added twist that interrupts occur independently of the instructions executed by the interrupted code, whereas most faults are triggered by the actions of the application software that incurs them. Given the importance of interrupts when assessing a system’s security, this section outlines the interrupt triggering and handling processes described in the SDM. Peripherals use bus-specific protocols to signal inter- rupts. For example, PCIe relies on **Message Signaled Interrupts (MSI)**, which are memory writes issued to specially designed memory addresses. The bus-specific interrupt signals are received by the **I/O Advanced Programmable Interrupt Controller (IOAPIC)** in the PCH, shown in Figure 20. The IOAPIC routes interrupt signals to one or more **Local Advanced Programmable Interrupt Controllers (LAPICs)**. As shown in Figure 22, each logical CPU has a LAPIC that can receive interrupt signals from the IOAPIC. The IOAPIC routing process assigns each interrupt to an 8-bit **interrupt vector** that is used to identify the interrupt sources, and to a 32-bit **APIC ID** that is used to identify the LAPIC that receives the interrupt. Each LAPIC uses a 256-bit **Interrupt Request Register (IRR)** to track the unserviced interrupts that it has received, based on the interrupt vector number. When the corresponding logical processor is available, the LAPIC copies the highest-priority unserviced interrupt vector to the **In-Service Register (ISR)**, and invokes the logical processor’s interrupt handling process. At the execution core level, interrupt handling reuses many of the mechanisms of fault handling (§ 2.8.2). The interrupt vector number in the LAPIC’s ISR is used to locate an interrupt handler in the IDT, and the handler is invoked, possibly after a privilege switch is performed. The interrupt handler does the processing that the device requires, and then writes the LAPIC’s **End Of Interrupt (EOI)** register to signal the fact that it has completed handling the interrupt. Interrupts are treated like faults, so interrupt handlers have full control over the execution environment of the application being interrupted. This is used to implement pre-emptive multi-threading, which relies on a clock device that generates interrupts periodically, and on an interrupt handler that performs context switches. System software can cause an interrupt on any logical processor by writing the target processor’s APIC ID into the **Interrupt Command Register (ICR)** of the LAPIC associated with the logical processor that the software is running on. These interrupts, called **Inter-Processor Interrupts (IPI)**, are needed to implement TLB shoot- downs (§ 2.11.5). ### 2.14 CPU Microcode Intel patents [110, 138] describing Software Guard Extensions (SGX) disclose that ==SGX is entirely implemented in microcode, except for the memory encryption engine.== #### 3.8.4 Defending against Cache Timing Attacks ==Fortunately, invalidating any of the preconditions for cache timing attacks is sufficient for defending against them.== **The easiest precondition to focus on is that the attacker must have access to memory locations that map to the same sets in a cache as the victim’s memory. This assumption can be invalidated by the judicious use of a cache partitioning scheme.** Performance concerns aside, the main difficulty associated with cache partitioning schemes is that they must be implemented by a trusted party. When the system software is trusted, it can (for example) use the principles behind page coloring [117, 177] to partition the caches [129] between mutually distrusting parties. This comes down to setting up the page tables in such a way that no two mutually distrusting software module are stored in physical pages that map to the same sets in any cache memory. However, if the system software is not trusted, the cache partitioning scheme must be implemented in hardware. ==The other interesting precondition is that the victim must access its memory in a data-dependent fashion that allows the attacker to infer private information from the observed memory access pattern. It becomes tempting to think that cache timing attacks can be prevented by eliminating data-dependent memory accesses from all the code handling sensitive data.== ==However, removing data-dependent memory accesses is difficult to accomplish in practice because instruction fetches must also be taken into consideration. [115] gives an idea of the level of effort required to remove data-dependent accesses from AES, which is a relatively simple data processing algorithm. At the time of this writing, we are not aware of any approach that scales to large pieces of software.== - Thankfully, the security of the Lightning Network protocol mostly relies on the security of a user's private keys, which is hardened against cache timing attacks due to the usage of [[secp256k1]]. ## 4 Related Work **[[Screen Shot 2022-03-01 at 2.56.02 AM.png|Table 12: Security features overview for the trusted hardware projects related to Intel's SGX]]** ![[Screen Shot 2022-03-01 at 2.56.02 AM.png]] ### 4.1 The IBM 4765 Secure Coprocessor The IBM 4758 [172], and its most current-day suc- cessor, the IBM 4765 [2] (shown in Figure 57) are rep- resentative examples of secure coprocessors. The 4758 was certified to withstand physical attacks to FIPS 140-1 Level 4 [171], and the 4765 meets the rigors of FIPS 140-2 Level 4 [1]. - This is probably the proprietary system that IBM uses in their cloud [[Confidential Computing]] solutions ### 4.2 ARM TrustZone ARM’s TrustZone [13] is a collection of hardware modules that can be used to conceptually partition a system’s resources between a **secure** world, which hosts a secure container, and a normal **world**, which runs an untrusted software stack. - [[ARM TrustZone]] generally seems pretty legit, but doesn't have attestation - The list of attacks that [[ARM TrustZone|TrustZone]] defends against but [[SGX]] doesn't (based on Table 12): - Malicious containers (cache timing) - Malicious OS (page fault recording) - Physical DRAM reads - Physical DRAM writes - Physical DRAM rollback writes - Physical DRAM address reads The TrustZone components do not have any counter- measures for physical attacks. However, a system that follows the recommendations in the TrustZone documen- tation will not be exposed to physical attacks, under a threat model that trusts the processor chip package. The AXI bus is designed to connect components in an SoC design, so it cannot be tapped by an attacker. The Trust- Zone documentation recommends having all the code and data in the secure world stored in on-chip SRAM, which is not subject to physical attacks. However, this ap- proach places significant limits on the secure container’s functionality, because on-chip SRAM is many orders of magnitude more expensive than a DRAM chip of the same capacity. - It uses the on-chip SRAM, which is much smaller and more expensive per unit of storage than DRAM ==TrustZone’s documentation does not describe any soft- ware attestation implementation.== However, it does out- line a method for implementing secure boot, which comes down to having the first-stage bootloader verify a signature in the second-stage bootloader against a public key whose cryptographic hash is burned into on-chip One-Time Programmable (OTP) polysilicon fuses. A hardware measurement root can be built on top of the same components, by storing a per-chip attestation key in the polyfuses, and having the first-stage bootloader measure the second-stage bootloader and store its hash in an on-chip SRAM region allocated to the secure world. The polyfuses would be gated by a TZMA IP block that makes them accessible only to the secure world. - No attestation ### 4.4 The Trusted Platform Module (TPM) The Trusted Platform Module (TPM) [71] introduced the software attestation model described at the beginning of this section. The TPM design does not require any hardware modifications to the CPU, and instead relies on an auxiliary tamper-resistant chip. The TPM chip is only used to store the attestation key and to perform software attestation. The TPM was widely deployed on commodity computers, because it does not rely on CPU modifications. Unfortunately, the cost of this approach is that the TPM has very weak security guarantees, as explained below. ### 4.5 Intel’s Trusted Execution Technology (TXT) Intel’s Trusted Execution Technology (TXT) [70] uses the TPM’s software attestation model and auxiliary tamper-resistant chip, but reduces the software inside the secure container to a virtual machine (guest operating system and application) hosted by the CPU’s hardware virtualization features (VMX [181]). TXT isolates the software inside the container from untrusted software by ensuring that the container has exclusive control over the entire computer while it is active. This is accomplished by a secure initialization authenticated code module (SINIT ACM) that effectively performs a warm system reset before starting the container’s VM. TXT requires a TPM chip with an extended register set. The registers used by the measured boot process de- scribed in § 4.4 are considered to make up the platform’s Static Root of Trust Measurement (SRTM). When a TXT VM is initialized, it updates TPM registers that make up the Dynamic Root of Trust Measurement (DRTM). While the TPM’s SRTM registers only reset at the start of a boot cycle, the DRTM registers are reset by the SINIT ACM, every time a TXT VM is launched. TXT does not implement DRAM encryption or HMACs, and therefore is vulnerable to physical DRAM attacks, just like TPM-based designs. Furthermore, early TXT implementations were vulnerable to attacks where a malicious operating system would program a device, such as a network card, to perform DMA transfers to the DRAM region used by a TXT container [188, 191]. In recent Intel CPUs, the memory controller is integrated on the CPU die, so the SINIT ACM can securely set up the memory controller to reject DMA transfers tar- geting TXT memory. An Intel chipset datasheet [105] documents an “Intel TXT DMA Protected Range” IIO configuration register. Early TXT implementations did not measure the SINIT ACM. Instead, the microcode implementing the TXT launch instruction verified that the code module contained an RSA signature by a hard-coded Intel key. SINIT ACM signatures cannot be revoked if vulnerabili- ties are found, so TXT’s software attestation had to be revised when SINIT ACM exploits [190] surfaced. Cur- rently, the SINIT ACM’s cryptographic hash is included in the attestation measurement. Last, the warm reset performed by the SINIT ACM does not include the software running in System Manage- ment Mode (SMM). SMM was designed solely for use by firmware, and is stored in a protected memory area (SMRAM) which should not be accessible to non-SMM software. However, the SMM handler was compromised on multiple occasions [44, 49, 164, 186, 189], and an attacker who obtains SMM execution can access the memory used by TXT’s container. ### 4.8 Intel SGX in Context - In general, well-designed but lots of proprietary / undocumented functionality which makes it difficult or even possible to conduct a full security analysis Intel’s Software Guard Extensions (SGX) [14, 79, 139] implements secure containers for applications without making any modifications to the processor’s critical ex- ecution path. SGX does not trust any layer in the com- puter’s software stack (firmware, hypervisor, OS). In- stead, SGX’s TCB consists of the CPU’s microcode and a few privileged containers. SGX introduces an approach to solving some of the issues raised by multi-core pro- cessors with a shared, coherent last-level cache. SGX does not extend caches or TLBs with container identity bits, and does not require any security checks during normal memory accesses. As suggested in the TrustZone documentation, SGX always ensures that a core’s TLBs only contain entries for the container that it is executing, which requires flushing the CPU core’s TLBs when context-switching between containers and untrusted software. SGX follows Bastion’s approach of having the un- trusted OS manage the page tables used by secure con- tainers. The containers’ security is preserved by a TLB miss handler that relies on an inverted page map (the EPCM) to reject address translations for memory that does not belong to the current container. Like Bastion, SGX allows the untrusted operating sys- tem to evict secure container pages, in a controlled fash- ion. After the OS initiates a container page eviction, it must prove to the SGX implementation that it also switched the container out of all cores that were execut- ing its code, effectively performing a very coarse-grained TLB shootdown. SGX’s microcode ensures the confidentiality, authen- ticity, and freshness of each container’s evicted pages, like Bastion’s hypervisor. However, SGX relies on a version-based Merkle tree, inspired by Aegis [174], and adds an innovative twist that allows the operating system to dynamically shape the Merkle tree. SGX also shares Bastion’s and Aegis’ vulnerability to memory access pat- tern leaks, namely a malicious OS can directly learn a container’s memory accesses at page granularity, and any piece of software can perform cache timing attacks. - (which can be prevented by ensuring that memory accesses aren't data-dependent) SGX’s software attestation is implemented using Intel’s Enhanced Privacy ID (EPID) group signature scheme [26], which is too complex for a microcode implementation. Therefore, SGX relies on an assort- ment of privileged containers that receive direct access to the SGX processor’s hardware keys. The privileged containers are signed using an Intel private key whose corresponding public key is hard-coded into the SGX microcode, similarly to TXT’s SINIT ACM. As SGX does not protect against cache timing at- tacks, the privileged enclave’s authors cannot use data- dependent memory accesses. For example, cache attacks on the Quoting Enclave, which computes attestation sig- natures, would provide an attack with a processor’s EPID signing key and completely compromise SGX. Intel’s documentation states that SGX guarantees DRAM confidentiality, authentication, and freshness by virtue of a Memory Encryption Engine (MEE). The MEE is informally described in an ISCA 2015 tutorial [103], and appears to lack a formal specification. In the absence of further information, we assume that SGX provides the same protection against physical DRAM attacks that Aegis and Bastion provide. - Undocumented, "assumed" to provide same guarantees ### 4.9 Sanctum - Basically, every container is partitioned into its own isolated physical spaces Sanctum [38] introduced a straightforward software/hardware co-design that yields the same resilience against software attacks as SGX, and adds protection against memory access pattern leaks, such as page fault monitoring attacks and cache timing attacks. Sanctum uses a conceptually simple cache partitioning scheme, where a computer’s DRAM is split into equally-sized continuous DRAM regions, and each DRAM region uses distinct sets in the shared last-level cache (LLC). Each DRAM region is allocated to exactly one container, so containers are isolated in both DRAM and the LLC. Containers are isolated in the other caches by flushing on context switches. Like XOM, Aegis, and Bastion, Sanctum also considers the hypervisor, OS, and the application software to conceptually belong to a separate container. Containers are protected from the untrusted outside software by the same measures that isolate containers from each other. Sanctum relies on a trusted security monitor, which is the first piece of firmware executed by the processor, and has the same security properties as those of Aegis’ security kernel. The monitor is measured by bootstrap code in the processor’s ROM, and its cryptographic hash is included in the software attestation measurement. The monitor verifies the operating system’s resource allocation decisions. For example, it ensures that no DRAM region is ever accessible to two different containers. Each Sanctum container manages its own page tables mapping its DRAM regions, and handles its own page faults. It follows that a malicious OS cannot learn the virtual addresses that would cause a page fault in the container. Sanctum’s hardware modifications work in conjunction with the security monitor to make sure that a container’s page tables only reference memory inside the container’s DRAM regions. ==The Sanctum design focuses completely on software attacks, and does not offer protection from any physical attack. The authors expect Sanctum’s hardware modifications to be combined with the physical attack protections in Aegis or Ascend.== ### 4.10 Ascend and Phantom The Ascend [52] and Phantom [132] secure processors introduced practical implementations of ==Oblivious RAM== [65] techniques in the CPU’s memory controller. These processors are resilient to attackers who can probe the DRAM address bus and attempt to learn a container’s private information from its DRAM memory access pattern. - Oblivious RAM! ==Implementing an ORAM scheme in a memory controller is largely orthogonal to the other secure architectures described above. It follows, for example, that Ascend’s ORAM implementation can be combined with Aegis’ memory encryption and authentication, and with Sanctum’s hardware extensions and security monitor, yielding a secure processor that can withstand both software attacks and physical DRAM attacks.== - Interesting combination of technologies, similar to Intel TPM + TXT ## 5 SGX PROGRAMMING MODEL #### 5.1.3 The SGX Enclave Control Structure (SECS) SGX stores per-enclave metadata in a SGX Enclave Control Structure (SECS) associated with each enclave. Each SECS is stored in a dedicated EPC page with the page type PT SECS. These pages are not intended to be mapped into any enclave’s address space, and are exclusively used by the CPU’s SGX implementation. **An enclave’s identity is almost synonymous to its SECS.** ### 5.2 The Memory Layout of an SGX Enclave #### 5.2.1 The Enclave Linear Address Range (ELRANGE) Each enclave designates an area in its virtual address space, called the enclave linear address range (ELRANGE), which is used to map the code and the sensi- tive data stored in the enclave’s EPC pages. The virtual address space outside ELRANGE is mapped to access non-EPC memory via the same virtual addresses as the enclave’s host process, as shown in Figure 61. ![[Screen Shot 2022-03-01 at 8.47.48 PM.png]] The SGX design guarantees that the enclave’s mem- ory accesses inside ELRANGE obey the virtual memory abstraction (§ 2.5.1), while memory accesses outside EL- RANGE receive no guarantees. Therefore, enclaves must store all their code and private data inside ELRANGE, and must consider the memory outside ELRANGE to be an untrusted interface to the outside world. #### 5.2.3 Address Translation for SGX Enclaves Under SGX, the operating system and hypervisor are still in full control of the page tables and EPTs, and each enclave’s code uses the same address translation process and page tables (§ 2.5) as its host application. This minimizes the amount of changes required to add SGX support to existing system software. At the same time, having the page tables managed by untrusted sys- tem software opens SGX up to the address translation attacks described in § 3.7. As future sections will reveal, a good amount of the complexity in SGX’s design can be attributed to the need to prevent these attacks. **SGX’s active memory mapping attacks defense mechanisms revolve around ensuring that each EPC page can only be mapped at a specific virtual address (§ 2.7). When an EPC page is allocated, its intended virtual address is recorded in the EPCM entry for the page, in the ADDRESS field.** **When an address translation (§ 2.5) result is the physical address of an EPC page, the CPU ensures$^6$ that the virtual address given to the address translation process matches the expected virtual address recorded in the page’s EPCM entry.** SGX also protects against some passive memory map- ping attacks and fault injection attacks by ensuring that the access permissions of each EPC page always match the enclave author’s intentions. The access permissions for each EPC page are specified when the page is allo- cated, and recorded in the readable (R), writable (W), and executable (X) fields in the page’s EPCM entry, shown in Table 15. ![[Screen Shot 2022-03-01 at 8.49.12 PM.png]] When an address translation (§ 2.5) resolves into an EPC page, the corresponding EPCM entry’s fields over- ride the access permission attributes (§ 2.5.3) specified in the page tables. For example, the W field in the EPCM entry overrides the writable (W) attribute, and the X field overrides the disable execution (XD) attribute. ### 5.3 The Life Cycle of an SGX Enclave ![[Screen Shot 2022-03-01 at 8.53.57 PM.png]] #### 5.3.2 Loading ![[Screen Shot 2022-03-01 at 9.00.23 PM.png]] ### 5.4 The Life Cycle of an SGX Thread ![[Screen Shot 2022-03-01 at 9.03.46 PM.png]] #### 5.4.1 Synchronous Enclave Entry ![[Screen Shot 2022-03-01 at 9.09.44 PM.png]] EENTER switches the logical processor to en- clave mode, but does not perform a privilege level switch (§ 2.8.2). Therefore, enclave code always exe- cutes at ring 3, with the same privileges as the application code that calls it. This makes it possible for an infras- tructure owner to allow user-supplied software to create and use enclaves, while having the assurance that the OS kernel and hypervisor can still protect the infrastructure from buggy or malicious software. EENTER transitions the logical processor into enclave mode, and sets the instruction pointer (RIP) to the value indicated by the entry point offset (OENTRY) field in the TCS that it receives. EENTER is used by an un- trusted caller to execute code in a protected environment, and therefore has the same security considerations as SYSCALL (§ 2.8), which is used to call into system soft- ware. Setting RIP to the value indicated by OENTRY guarantees to the enclave author that the enclave code will only be invoked at well defined points, and prevents a malicious host application from bypassing any security checks that the enclave author may perform. #### 5.4.2 Synchronous Enclave Exit It may seem unfortunate that enclave code can induce faults in its caller. For better or for worse, this perfectly matches the case where an application calls into a dynam- ically loaded module. More specifically, the module’s code is also responsible for preserving stack-related reg- isters, and a buggy module might jump anywhere in the application code of the host process. #### 5.4.3 Asynchronous Enclave Exit (AEX) If a hardware exception, like a fault (§ 2.8.2) or an in- terrupt (§ 2.12), occurs while a logical processor is ex- ecuting an enclave’s code, the processor performs an Asynchronous Enclave Exit (AEX) before invoking the system software’s exception handler, as shown in Figure 67. ![[Screen Shot 2022-03-01 at 9.20.01 PM.png]] In the Intel architecture, if a hardware exception oc- curs, the application code’s execution context can be read and modified by the system software’s exception handler (§ 2.8.2). This is acceptable when the system software is trusted by the application software. However, under SGX’s threat model, the system software is not trusted by enclaves. Therefore, the AEX step erases any secrets that may exist in the execution state by resetting all its registers to predefined values. #### 5.4.4 Recovering from an Asynchronous Exit When a hardware exception occurs inside enclave mode, the processor performs an AEX before invoking the ex- ception’s handler set up by the system software. The AEX sets up the execution context in such a way that when the system software finishes processing the excep- tion, it returns into an asynchronous exit handler in the enclave’s host process. The asynchronous exception han- dler usually executes the ERESUME instruction, which causes the logical processor to go back into enclave mode and continue the computation that was interrupted by the hardware exception. ERESUME shares much of its functionality with EENTER. This is best illustrated by the similarity be- tween Figures 68 and 67. ![[Screen Shot 2022-03-01 at 9.42.53 PM.png]] The main difference between ERESUME and EENTER is that the former uses an SSA that was “filled out” by an AEX (§ 5.4.3), whereas the latter uses an empty SSA. Therefore, ERESUME results in a #GP fault if the CSSA field in the provided TCS is 0 (zero), whereas EENTER fails if CSSA is greater than or equal to NSSA. An interesting edge case that ERESUME handles cor- rectly is that it sets XCR0 to the XFRM enclave at- tribute before performing an XRSTOR. It follows that ERESUME fails if the requested feature bitmap (RFBM) in the SSA is not a subset of XFRM. This matters be- cause, while an AEX will always use the XFRM value as the RFBM, enclave code executing on another thread is free to modify the SSA contents before ERESUME is called. The correct sequencing of actions in the ERESUME im- plementation prevents a malicious application from using an enclave to modify registers associated with extended architectural features that are not declared in XFRM. This would break the system software’s ability to provide thread-level execution context isolation. ### 5.5 EPC Page Eviction As illustrated in Figure 69, SGX supports evicting EPC pages to DRAM pages outside the PRM range. The system software is expected to use its existing page swap- ping implementation to evict the contents of these pages out of DRAM and onto a disk. ![[Screen Shot 2022-03-01 at 10.52.17 PM.png]] SGX’s eviction feature revolves around the EWB instruction, described in detail in § 5.5.4. Essentially, EWB evicts an EPC page into a DRAM page outside the EPC and marks the EPC page as available, by zeroing the VALID field in the page’s EPCM entry. The SGX design relies on symmetric key cryptograpy 3.1.1 to guarantee the confidentiality and in- tegrity of the evicted EPC pages, and on nonces (§ 3.1.4) to guarantee the freshness of the pages brought back into the EPC. These nonces are stored in Version Ar- rays (VAs), covered in § 5.5.2, which are EPC pages dedicated to nonce storage. As explained in § 5.1.1, SGX leaves the system soft- ware in charge of managing the EPC. It naturally follows that the SGX instructions described in this section, which are used to implement EPC paging, are only available to system software, which runs at ring 0 § 2.3. #### 5.5.1 Page Eviction and the TLBs One of the least promoted accomplishments of SGX is that it does not add any security checks to the memory execution units (§ 2.9.4, § 2.10). Instead, SGX’s access control checks occur after an address translation (§ 2.5) is performed, right before the translation result is written into the TLBs (§ 2.11.5). This aspect is generally down- played throughout the SDM, but it becomes visible when explaining SGX’s EPC page eviction mechanism. A full discussion of SGX’s memory access protections checks merits its own section, and is deferred to § 6.2. The EPC page eviction mechanisms can be explained using only two requirements from SGX’s security model. First, when a logical processor exits an enclave, either via EEXIT (§ 5.4.2) or via an AEX (§ 5.4.3), its TLBs are flushed. Second, when an EPC page is deallocated from an enclave, all logical processors executing that enclave’s code must be directed to exit the enclave. This is sufficient to guarantee the removal of any TLB entry targeting the deallocated EPC. System software can cause a logical processor to exit an enclave by sending it an Inter-Processor Interrupt (IPI, § 2.12), which will trigger an AEX when received. Essentially, this is a very coarse-grained TLB shootdown. SGX does not trust system software. Therefore, be- fore marking an EPC page’s EPCM entry as free, the SGX implementation must ensure that the OS kernel has flushed all the TLBs that might contain translations for the page. Furthermore, performing IPIs and TLB flushes for each page eviction would add a significant overhead to a paging implementation, so the SGX design allows a batch of pages to be evicted using a single IPI / TLB flush sequence. The TLB flush verification logic relies on a 1-bit EPCM entry field called BLOCKED. As shown in Fig- ure 70, the VALID and BLOCKED fields yield three possible EPC page states. A page is free when both bits are zero, in use when VALID is one and BLOCKED is zero, and blocked when both bits are one. ![[Screen Shot 2022-03-01 at 11.08.03 PM.png]] Blocked pages are not considered accessible to enclaves. If an address translation results in a blocked EPC page, the SGX implementation causes the translation to result in a Page Fault (#PF, § 2.8.2). This guarantees that once a page is blocked, the CPU will not create any new TLB entries pointing to it. Furthermore, every SGX instruction makes sure that the EPC pages on which it operates are not blocked. For example, EENTER ensures that the TCS it is given is not blocked, that its enclave’s SECS is not blocked, and that every page in the current SSA is not blocked. In order to evict a batch of EPC pages, the OS kernel must first issue EBLOCK instructions targeting them. The OS is also expected to remove the EPC page’s mapping from page tables, but is not trusted to do so. After all the desired pages have been blocked, the OS kernel must execute an ETRACK instruction, which di- rects the SGX implementation to keep track of which log- ical processors have had their TLBs flushed. ETRACK re- quires the virtual address of an enclave’s SECS (§ 5.1.3). If the OS wishes to evict a batch of EPC pages belonging to multiple enclaves, it must issue an ETRACK for each enclave #### 5.5.4 Evicting an EPC Page The system software evicts an EPC page using the EWB instruction, which produces all the data needed to restore the evicted page at a later time via the ELDU instruction, as shown in Figure 71. ![[Screen Shot 2022-03-01 at 11.21.56 PM.png]] EWB’s output consists of an encrypted version of the evicted EPC page’s contents, a subset of the fields in the EPCM entry corresponding to the page, the nonce discussed in § 5.5.2, and a message authentication code (MAC, § 3.1.3) tag. With the exception of the nonce, EWB writes its output in DRAM outside the PRM area, so the system software can choose to further evict it to disk. The EPC page contents is encrypted, to protect the confidentiality of the enclave’s data while the page is stored in the untrusted DRAM outside the PRM range. Without the use of encryption, the system software could learn the contents of an EPC page by evicting it from the EPC. The page metadata is stored in a Page Information (PAGEINFO) structure, illustrated in Figure 72. This structure is similar to the PAGEINFO structure described in § 5.3.2 and depicted in Figure 64, except that the SECINFO field has been replaced by a PCMD field, which contains the virtual address of a Page Crypto Meta- data (PCMD) structure. ![[Screen Shot 2022-03-01 at 11.26.30 PM.png]] The LINADDR field in the PAGEINFO structure is used to store the ADDRESS field in the EPCM entry, which indicates the virtual address intended for accessing the page. The PCMD structure embeds the Security Infor- mation (SECINFO) described in § 5.3.2, which is used to store the page type (PT) and the access permission flags (R, W, X) in the EPCM entry. The PCMD structure also stores the enclave’s ID (EID, § 5.5.3). These fields are later used by ELDU or ELDB to populate the EPCM entry for the EPC page that is reloaded. The metadata described above is stored unencrypted, so the OS has the option of using the information inside as-is for its own bookkeeping. This has no negative im- pact on security, because the metadata is not confidential. In fact, with the exception of the enclave ID, all the meta- data fields are specified by the system software when ECREATE is called. The enclave ID is only useful for identifying the enclave that the EPC page belongs to, and the system software already has this information as well. Asides from the metadata described above, the PCMD structure also stores the MAC tag generated by EWB. The MAC tag covers the authenticity of the EPC page contents, the metadata, and the nonce. The MAC tag is checked by ELDU and ELDB, which will only load an evicted page back into the EPC if the MAC verification confirms the authenticity of the page data, metadata, and nonce. This security check protects against the page swapping attacks described in § 3.7.3. ## 5.7 SGX Enclave Versioning Support The software attestation model (§ 3.3) introduced by the Trusted Platform Module (§ 4.4) relies on a mea- surement (§ 5.6), which is essentially a content hash, to identify the software inside a container. The downside of using content hashes for identity is that there is no relation between the identities of containers that hold different versions of the same software. In practice, it is highly desirable for systems based on secure containers to handle software updates without having access to the remote party in the initial software attestation process. This entails having the ability to migrate secrets between the container that has the old version of the software and the container that has the updated version. This requirement translates into a need for a separate identity system that can recognize the relationship between two versions of the same software. SGX supports the migration of secrets between en- claves that represent different versions of the same soft- ware, as shown in Figure 75. ![[Screen Shot 2022-03-03 at 2.32.31 PM.png]] The secret migration feature relies on a one-level cer- tificate hierarchy ( § 3.2.1), where each enclave author is a Certificate Authority, and each enclave receives a certificate from its author. These certificates must be for- matted as Signature Structures (SIGSTRUCT), which are described in § 5.7.1. The information in these certificates is the basis for an enclave identity scheme, presented in § 5.7.2, which can recognize the relationship between different versions of the same software. The EINIT instruction (§ 5.3.3) examines the target enclave’s certificate and uses the information in it to pop- ulate the SECS (§ 5.1.3) fields that describe the enclave’s certificate-based identity. This process is summarized in § 5.7.4. Last, the actual secret migration process is based on the key derivation service implemented by the EGETKEY instruction, which is described in § 5.7.5. The sending enclave uses the EGETKEY instruction to obtain a sym- metric key (§ 3.1.1) based on its identity, encrypts its secrets with the key, and hands off the encrypted secrets to the untrusted system software. The receiving enclave passes the sending enclave’s identity to EGETKEY, ob- tains the same symmetric key as above, and uses the key to decrypt the secrets received from system software. The symmetric key obtained from EGETKEY can be used in conjunction with cryptographic primitives that protect the confidentiality (§ 3.1.2) and integrity (§ 3.1.3) of an enclave’s secrets while they are migrated to another enclave by the untrusted system software. However, sym- metric keys alone cannot be used to provide freshness guarantees (§ 3.1), so secret migration is subject to re- play attacks. This is acceptable when the secrets being migrated are immutable, such as when the secrets are encryption keys obtained via software attestation. #### 5.7.1 Enclave Certificates The SGX design requires each enclave to have a certifi- cate issued by its author. This requirement is enforced by EINIT (§ 5.3.3), which refuses to operate on enclaves without valid certificates. The SGX implementation consumes certificates for- matted as Signature Structures (SIGSTRUCT), which are intended to be generated by an enclave building toolchain, as shown in Figure 76. A SIGSTRUCT certificate consists of metadata fields, the most interesting of which are presented in Table 19, and an RSA signature that guarantees the authenticity of the metadata, formatted as shown in Table 20. The semantics of the fields will be revealed in the following sections. ![[Screen Shot 2022-03-03 at 2.40.16 PM.png]] ![[Screen Shot 2022-03-03 at 2.40.32 PM.png]] #### 5.7.2 Certificate-Based Enclave Identity ![[Screen Shot 2022-03-03 at 2.41.05 PM.png]] #### 5.8.1 Local Attestation #### 5.8.2 Remote Attestation The Provisioning Secret is generated at the key gener- ation facility, where it is burned into the processor’s e- fuses and stored in the database used by Intel’s provision- ing service. The Seal Secret is generated inside the pro- cessor chip, and therefore is not known to Intel. ==This approach has the benefit that an attacker who compromises Intel’s facilities cannot derive most keys produced by EGETKEY, even if the attacker also compromises a victim’s firmware and obtains the OWNEREPOCH (§ 5.7.5) value. These keys include the Seal keys (§ 5.7.5) and Report keys (§ 5.8.1) introduced in previous sections.== The only documented exception to the reasoning above is the Provisioning key, which is effectively a shared se- cret between the SGX-enabled processor and Intel’s pro- visioning service. Intel has to be able to derive this key, so the derivation material does not include the Seal Secret or the OWNEREPOCH value, as shown in Figure 83. EGETKEY derives the Provisioning key using the cur- rent enclave’s certificate-based identity (MRSIGNER, ISVPRODID, ISVSVN) and the SGX implementation’s SVN (CPUSVN). This approach has a few desirable se- curity properties. First, Intel’s provisioning service can be assured that it is authenticating a Provisioning Enclave signed by Intel. Second, the provisioning service can use the CPUSVN value to reject SGX implementations with known security vulnerabilities. Third, this design admits multiple mutually distrusting provisioning services. EGETKEY only derives Provisioning keys for enclaves whose PROVISIONKEY attribute is set to true. § 5.9.3 argues that this mechanism is sufficient to protect the computer owner from a malicious software provider that attempts to use Provisioning keys to track a CPU chip across OWNEREPOCH changes. After the Provisioning Enclave obtains a Provision- ing key, it uses the key to authenticate itself to Intel’s provisioning service. Once the provisioning service is convinced that it is communicating to a trusted Provi- sioning enclave in the secure environment provided by a SGX-enabled processor, the service generates an At- testation Key and sends it to the Provisioning Enclave. The enclave then encrypts the Attestation Key using a Provisioning Seal key, and hands off the encrypted key to the system software for storage.