WARNING: DO NOT PROCEED UNTIL YOU COMPLETE ABOVE
Academic Integrity. Adhere to the highest levels of academic integrity. Submit your work individually or as a group. Cite any sources that you have referred to. Also, state anyone who you may have discussed your approach with. Use the ‘Whiteboard approach’ discussed in lecture at the beginning of the semester.
The purpose of this assignment is dual-fold. First, it will expose you to gem5, which we will use more heavily going forward (for both projects and assignment) Second, it will give you experience measuring performance on different systems, and comparing and contrasting those systems. It is a modular platform for computer-system architecture research, encompassing system-level architecture as well as processor micro-architecture.
$REPO
below here refers to the repository you have cloned into your machine.
All commands below have been tested and will run on the cs-arch servers. We will not be supporting any other machines.
Install plotting dependencies
pip install --user numpy pandas matplotlib
Use the preinstalled gem5 if your disk space quota is a problem for building gem5
This applies only to the labs and assignments. For final projects request extra quota
and build the gem5 binaries yourself
# gem5 comes preinstalled at /data/gem5-baseline
export M5_PATH=/data/gem5-baseline
cd $REPO
cd microbenchmark
make
Now, you will run your application in gem5 with the configuration script.
$ export M5_PATH=/data/gem5-baseline
$ $M5_PATH/build/X86/gem5.opt -re --outdir=$PWD/results/X86/run_micro/CCa/Simple/Inf gem5-config/run_micro.py Simple Inf microbenchmark/CCa/bench.X86
$ ls results/X86/run_micro/CCa/Simple/Inf/
Pay attention to the following positional params that the run_micro script supports. You can see these set up here:
# gem5-config/run_micro.py:line 219
parser.add_argument('cpu', choices = valid_cpus.keys())
parser.add_argument('memory_model', choices = valid_memories.keys())
parser.add_argument('binary', type = str, help = "Path to binary to run")
Params | Description |
---|---|
cpu | The type of cpu. The options are Simple, Minor4, DefaultO3, O3_W256, O3_W2K. The corresponding objects are declared. SimpleCPU, Minor4CPU, DefaultO3CPU, O3_W256CPU, O3_W2KCPU. These are created in the same file. Read how cpus are set up. |
memory_model | Inf, SingleCycle, Slow. The objects are created in system.py . Inf is a memory model that is infinitely large and has infinite bandwidth. SingleCycle is a memory system that completes memory operations in 1 cycle. Finally Slow is one that completes DRAM accesses in 100ns. This exposes the need for L1 and L2 caches. |
binary | program to simulate using gem5 |
Here are the important objects in system.py. The baseline system definition. The CPUs are created in run_micro.py. If you do not understand the terms for TimingSimple, Minor etc.. complete gem5-lab. The CPUs objects derive from the base gem5-cpus and modify the number of parameters and ports.
class SimpleCPU(TimingSimpleCPU):
...
class Minor4CPU(MinorCPU):
...
class O3_W256CPU(DerivO3CPU):
...
class O3_W2KCPU(DerivO3CPU):
...
# A really large 2000 instruction window OOO processor.
class O3_W2KCPU(DerivO3CPU):
branchPred = BranchPredictor()
fuPool = Ideal_FUPool()
fetchWidth = 32
decodeWidth = 32
renameWidth = 32
dispatchWidth = 32
issueWidth = 32
wbWidth = 32
commitWidth = 32
squashWidth = 32
fetchQueueSize = 256
LQEntries = 250
SQEntries = 250
numPhysIntRegs = 1024
numPhysFloatRegs = 1024
numIQEntries = 2096
numROBEntries = 2096
We are going to try and understand the impact of the three major modules of a computer system on the end-to-end performance of a benchmark, cpu, caches, and memory. There are two challenges: i) each module’s impact on end performance varies from application to application. This means that we have to study the same configuration across multiple benchmarks to understand the overall impact. ii) there are multiple design choices and parameters that need to be set for each module.
In this experiment we are going to be varying both CPU and memory model to try and understand the importance of each for overall benchmark performance.
$M5_PATH/build/X86/gem5.opt gem5-config/run_micro.py --help
Parameter | Options |
---|---|
CPU model | 5 options. Simple,Minor4,DefaultO3,O3_W256,O3_W2K |
Memory model | 3 options. Inf, SingleCycle, Slow. |
Benchmarks | CCa,CCl,DP1f,ED1,EI,MI |
Total | 5x3x5 benchmarks. 75 simulations. |
To help you with these simulations we have provided two scripts launch.py and scripts.py. launch.py
is a script that uses python multiprocessing library for launching multiple gem5 simulations. It takes a single parameter the number of cores to be used for the simulations. You can fork more simulations than number of cores; they just get serialized. Read here for python multiprocessing.
# Launch 8 simulations across 8 cores
# You should grab a slurm session and use
# the number of cores you grabbed as a parameter
# Students cannot grab more than 8 cores at-a-time.
# If you run without slurm we may kill your jobs
$ cd $REPO
$ export M5_PATH=/data/gem5-baseline
$ export LAB_PATH=$PWD
$ python3 launch.py 8
# Wait for jobs to complete.
# Check squeue to ensure your job is complete.
We have provided you an example configuration. Where we perform 1 CPU (Simple) x 3 memory models (Inf, SingleCycle,Slow) x Benchmarks simulation. This will multiple 15 simulations of the number of cores set in line 40:mp.Pool(args.N)
and run them to completion. Note that launch.py waits for all simulations to complete. This will create a results/
. The organization of results is results/X86/run_micro/[Benchmark]/[CPU]/[MEM]
for each if the simulation runs.
Plotting scripts
We have provided some basic plotting scripts to get started. We are using matplotlib. The function gem5GetStat
extracts the user-specified stats from the stats.txt from each [Benchmark]/[CPU]/[MEM]. We insert this data info a panda frame line 58-60:plot/scripts.py
and plot it.
$ cd plots
$ python3 scripts.py
Include a PDF in your repo along with the plots/ folder. This file will contain your observations and conclusions from the experiment.
Plot all the runs (you can use line or bar) and insert them into your markdown report. We have included a REPORT.md for convenience. You can convert markdown to pdf.
Hint: Look at the code of these benchmarks
Why markdown?
Cause its a semi-structured text-based format that is easy to read and parse.
Markdown files
VScode markdown extension
Online markdown editor WARNING: images have to be included in after copying from stackedit
Two formats for the report file. REPORT.md and REPORT.pdf. REPORT.md will be the file you fill out
Organize all your plots into the plots folder
In this experiment we will try to understand the importance of caches, locality and relationship with processor model.
WARNING: If you are not familiar with cache geometries most likely you do not have prerequisites
Hint: you may want to add a command line parameter to run.py to set the cache configuration
The system.py already provides flags for setting the cache sizes _L1cachesize
_L2cachesize
Report the following:
Simulate the following configurations.
Experiment 3.1:
CPU Model | Frequency (GHz) | Memory |
---|---|---|
Simple | 1 | DDR3_1600_8x8 |
Simple | 2 | DDR3_1600_8x8 |
Simple | 4 | DDR3_1600_8x8 |
Minor4 | 1 | DDR3_1600_8x8 |
Minor4 | 2 | DDR3_1600_8x8 |
Minor4 | 4 | DDR3_1600_8x8 |
Experiment 3.2:
CPU Model | Frequency (GHz) | Memory |
---|---|---|
Simple | 4 | DDR3_2133_8x8 |
Simple | 4 | LPDDR2_S4_1066_1x32 |
Simple | 4 | HBM_1000_4H_1x64 |
Minor4 | 4 | DDR3_2133_8x8 |
Minor4 | 4 | LPDDR2_S4_1066_1x32 |
Minor4 | 4 | HBM_1000_4H_1x64 |
You will change the CPU model, frequency, and memory configuration while testing other benchmarks.
DDR3_2133_8x8
, which models DDR3 with a faster clock.LPDDR2_S4_1066_1x32
, which models LPDDR2, low-power DRAM often found in mobile devices.HBM_1000_4H_1x64
, which models High Bandwidth Memory, used in GPUs and network devices.For Experiment 3.1, we vary the frequency & CPU model and keep the memory ram model fixed. In Experiment 3.2, we vary the memory model & CPU model while keeping the frequency fixed.
Hint: you may want to add a command line parameter to control the memory configuration. Check which provided memory model(Slow
, Inf
, SingleCycle
) is capable of changing the underlying technology.
gem5 has support for annotating your binary with special “region of interest” (ROI) magic instructions. See
ROI commands interact with the gem5 simulator and let the underlying config know when the “REGION-OF-INTEREST” is reached in the application.
We have annotated your binary with ROI instructions. Remove them and re-run the comparison between MinorCPU
at 1 and 2 GHz. To compile your annotated .cpp file, you need to make two changes to your gcc
compilation command.
ROI_BEGIN
and ROI_END
calls from the benchmarksworkbegin
we would continue onto the simulation. Now you will need to modify the script to stop simulation when the program exits since you will not hit any ROI. Look for the exit_event
checks and modify to terminate simulation gracefully.# If things are working correctly after you remove the ROI instruction.
$M5_PATH/build/X86/gem5.opt -re --outdir=$PWD/results/X86/run_micro/CCa/Simple/Inf gem5-config/run_micro.py Simple Inf microbenchmark/CCa/bench.X86
Add answers to the following questions to your report.
Check in your repo, along with REPORT.md, REPORT.pdf To receive point you have to check in all your plots and answers. You also need to include a README with instructions on which commands to run to generate results and plots. 100 points will be evenly divided amongst your questions.
WARNING: IF YOU DO NOT INCLUDE THE README or REPORT.md or REPORT.pdf; we will zero out your assignment
Do not include the PDF in the archive, submit it as a separate file. You should submit it on Canvas
python `which scons` build/X86/gem5.opt -jX \
CPU_MODELS=AtomicSimpleCPU,TimingSimpleCPU,O3CPU,MinorCPU
NameError: name 'MinorCPU' is not defined
$ ./build/X86/gem5.opt ./configs/tutorial/simple.py
gem5 Simulator System. http://gem5.org
...
NameError: name 'MinorCPU' is not defined
You did not compile gem5 with the flag mentioned in the compilation instructions
_.
Recompile gem5 with the flag and try again.
fatal: fatal condition !process occurred: Unknown error creating process object.
Memory Usage: 2209384 KBytes
WARNING: Do not simply include a data dump
This assignment has been modified by Arrvindh Shriraman, Alaa Alameldeen, Mahmoud Abumandour. We thank the creators of gem5-art for providing the environment for script aids.