Memory dependence prediction

Memory dependence prediction is a technique, employed by high-performance out-of-order execution microprocessors that execute memory access operations (loads and stores) out of program order, to predict true dependencies between loads and stores at instruction execution time. With the predicted dependence information, the processor can then decide to speculatively execute certain loads and stores out of order, while preventing other loads and stores from executing out-of-order (keeping them in-order). Later in the pipeline, memory disambiguation techniques are used to determine if the loads and stores were correctly executed and, if not, to recover.

By using the memory dependence predictor to keep most dependent loads and stores in order, the processor gains the benefits of aggressive out-of-order load/store execution but avoids many of the memory dependence violations that occur when loads and stores were incorrectly executed. This increases performance because it reduces the number of pipeline flushes that are required to recover from these memory dependence violations. See the memory disambiguation article for more information on memory dependencies, memory dependence violations, and recovery.

In general, memory dependence prediction predicts whether two memory operations are dependent, that is, if they interact by accessing the same memory location. Besides using store to load (RAW or true) memory dependence prediction for the out-of-order scheduling of loads and stores, other applications of memory dependence prediction have been proposed. See for example.^[1]

Memory dependence prediction is an optimization on top of memory dependence speculation. Sequential execution semantics imply that stores and loads appear to execute in the order specified by the program. However, as with out-of-order execution of other instructions, it may be possible to execute two memory operations in a different from the program implied order. This is possible when the two operations are independent. In memory dependence speculation a load may be allowed to execute before a store that precedes it. Speculation succeeds when the load is independent of the store, that is, when the two instructions access different memory locations. Speculation fails when the load is dependent upon the store, that is, when the two accesses overlap in memory. In the first, modern out-of-order designs, memory speculation was not used as its benefits were limited. Αs the scope of the out-of-order execution increased over few tens of instructions, naive memory dependence speculation was used. In naive memory dependence speculation,^[2] a load is allowed to bypass any preceding store. As with any form of speculation, it is important to weight the benefits of correct speculation against the penalty paid on incorrect speculation. As the scope of out-of-order execution increases further into several tens of instructions, the performance benefits of naive speculation decrease. To retain the benefits of aggressive memory dependence speculation while avoiding the costs of mispeculation several predictors have been proposed.

Selective memory dependence prediction^[2]^[3] stalls specific loads until it is certain that no violation may occur. It does not explicitly predict dependencies. This predictor may delay loads longer than necessary and hence result in suboptimal performance. In fact, in some cases it performs worse than naively speculating all loads as early as possible. This is because often its faster to mispeculate and recover than to wait for all preceding stores to execute. Exact memory dependence prediction was developed at the University of Wisconsin–Madison. Specifically, Dynamic Speculation and Synchronization^[2]^[3] delays loads only as long as it is necessary by predicting the exact store a load should wait for. This predictor predicts exact dependences (store and load pair). The synonym predictor^[1] groups together all dependences that share a common load or store instruction. The store sets^[4] predictor represents multiple potential dependences efficiently by grouping together all possible stores a load may dependent upon. The store barrier^[5] predictor treats certain store instructions as barriers. That is, all subsequent load or store operations are not allowed to bypass the specific store. The store barrier predictor does not explicitly predict dependencies. This predictor may unnecessarily delay subsequent, yet independent loads. Memory dependence prediction has other applications beyond the scheduling of loads and stores. For example, speculative memory cloaking^[1] and speculative memory bypassing^[1] use memory dependence prediction to streamline the communication of values through memory.

Analogy to branch prediction

Memory dependence prediction for loads and stores is analogous to branch prediction for conditional branch instructions. In branch prediction, the branch predictor predicts which way the branch will resolve before it is known. The processor can then speculatively fetch and execute instructions down one of the paths of the branch. Later, when the branch instruction executes, it can be determined if the branch instruction was correctly predicted. If not, this is a branch misprediction, and a pipeline flush is necessary to throw away instructions that were speculatively fetched and executed.

Branch prediction can be thought of as a two step process. First, the predictor determines the direction of the branch (taken or not). This is a binary decision. Then, the predictor determines the actual target address. Similarly, memory dependence prediction can be thought of as a two step process. First, the predictor determines whether there is a dependence. Then it determines which this dependence is.

References

1 2 3 4 Streamlining Inter-Operation Memory Communication via Memory Dependence Prediction, Moshovos and Sohi, in the Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture, Dec. 1997.
1 2 3 Dynamic Speculation and Synchronization of Memory Dependences, Moshovos, Breach, Vijaykumar and Sohi, in the Proceedings of the 24th Annual ACM/IEEE Conference on Computer Architecture, June 1997. Also as technical report, Computer Sciences Department, University of Wisconsin–Madison, March 1996.
1 2 Memory Dependence Prediction, Moshovos, Ph.D. Thesis, Computer Sciences Department, University of Wisconsin–Madison, Dec. 1998.
↑ Memory Dependence Prediction using Store Sets, Chrysos and Emer, in the Proceedings of the 25th Annual ACM/IEEE Conference on Computer Architecture, June 1998.
↑ Apparatus to dynamically control the out-of-order execution of load-store instructions in a processor capable of dispatching, issuing and executing multiple instructions in a single processor cycle, Hesson, LeBlanc and Ciavaglia, IBM, US Patent 5,615,350, March 1997.

CPU technologies

Architecture	Von Neumann Harvard (Modified) Dataflow TTA

Instruction set	ASIP CISC RISC EDGE (TRIPS) VLIW (EPIC) MISC OISC NISC ZISC Comparison

Word size	1-bit 4-bit 8-bit 9-bit 10-bit 12-bit 15-bit 16-bit 18-bit 22-bit 24-bit 25-bit 26-bit 27-bit 31-bit 32-bit 33-bit 34-bit 36-bit 39-bit 40-bit 48-bit 50-bit 60-bit 64-bit 128-bit 256-bit 512-bit Variable

Execution	Instruction pipelining Bubble Operand forwarding Out-of-order execution Register renaming Speculative execution Branch predictor Memory dependence prediction Hazards

Parallel level	Bit Bit-serial Word Instruction Scalar Superscalar Task Thread Process Data Vector Memory

Multithreading	Temporal Simultaneous Preemptive Cooperative

Flynn's taxonomy	SISD SIMD MISD MIMD SPMD Addressing mode

Core count	Single-core processor Multi-core processor Manycore processor

Types	Digital signal processor (DSP) GPGPU Microcontroller Physics processing unit System on a chip (SoC) Cellular

Components	Address generation unit (AGU) Arithmetic logic unit (ALU) Barrel shifter Floating-point unit (FPU) Back-side bus Multiplexer Demultiplexer Registers Memory management unit (MMU) Translation lookaside buffer (TLB) Cache Register file Microcode Control unit Clock rate

Power management	APM ACPI Dynamic frequency scaling Dynamic voltage scaling Clock gating

Hardware security	Non-executable memory (NX bit) Bounds checking (Intel MPX) Hardware restriction (firmware) Software Guard Extensions (Intel SGX) Trusted Execution Technology Secure cryptoprocessor Hardware security module Hengzhi chip

This article is issued from Wikipedia - version of the 3/16/2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Memory dependence prediction

Analogy to branch prediction

See also

References