18.1 Introduction to Cache Simulation with Simics
By default, Simics does not model any cache system. It uses its own memory
system to obtain high speed simulation, and modeling a hardware cache model
would only slow it down.
Note: Although it is often said that Simics models a processor, such
as an UltraSPARC III or an x86 P4,
remember that Simics is an instruction-set simulator, not a
processor simulator. Transactions coming out of a real x86 P4 processor have
already gone through the L1 and L2 caches, so they consist mainly of cache
misses to be fetched from memory. In Simics, all transactions performed by the
processor execution core are made visible.
For simplicity and performance, Simics does not model incoherence. In Simics,
the memory is always up to date with the latest CPU and device
transactions. Memory accesses take no time to execute and are always atomic.
The possibility to observe and alter memory transactions (both in timing and in
execution) makes Simics very suitable for cache simulation:
- Cache Profiling
- The goal is to gather information about the cache
behavior of a system or an application. Unless the application runs on
multi-processors, takes a lot of interrupts or runs a lot of system-level code,
the timing of the memory operations is often irrelevant, thus no stalling is
necessary. The timing-model interface is a good place to be
informed of all transactions sent by the processor.
Note that this type of simulation does not modify the execution of the
target program. It could be done by using Simics as a simple memory transaction
trace generator, and then computing the cache state evolution
afterward. However, doing the cache simulation at the same time as the
execution enables a number of optimizations that Simics models make good use
- Cache Timing
- The goal is to study the timing behavior of the
transactions, in which case a transaction to memory should take much more time
than, for example, a transaction to an L1 cache. This is useful when studying
interactions between several CPUs, or to grossly estimate the CPI of an
application. To be able to stall the processor, a cache timing model should be
connected to the timing-model interface. Simics models can be
used for such a simulation.
This type of simulation modifies the execution, since interrupts and
multi-processor interaction will be influenced by the timing provided by the
cache model. However, unless the target program is not written properly, the
execution will always be correct, although different from the execution
obtained without any cache model.
- Cache Content Simulation
- It is possible to change Simics coherency
model by allowing a cache model to contain data that is different from the
contents of the memory. Such a model needs to use both the
timing-model and the snoop-memory interfaces to
properly handle the memory transactions (it must be able to change the values
of loads and stores or to prevent their execution to main memory).
Note that this kind of simulation is difficult to do and requires a
well-written, bug-free cache model, since it can prevent the target program
from executing properly.
Simics comes with cache models that allow for the two first types of
- g-cache is the standard cache model. It handles one
transaction at a time in a flat way: all needed operations (copy-back, fetch,
etc.) are performed in order and at once. The cache returns the sum of the
stall times reported for each operation.
- g-cache-ooo is a more complex model adapted to Simics
MAI. It handles multiple outstanding transactions and keeps track of their
current status (copy-back or fetch ongoing). g-cache-ooo is the
standard cache model for Simics out-of-order; it is described in detail in
Simics Micro-Architectural Interface.
The source code of these caches is available in