Assignment 2

In class we studied scalable synchronization, and in particular we learned that how you implement a spinlock can significantly affect performance of the application. In this assignment you will experiment with different lock implementations and build your own. 

Experimental platforms:

You will perform experiments on one or more of these machines.
  • – this is a Solaris/x86 quad-core system from Intel, running Solaris 11. Quad has two physical chips (or CPU packages), each with  two cores. The two cores share an L2 cache.  L2 cache is shared among processors on the same chip: processors 0 and 1, and  processors 2 and 3.  Lock synchronization will be lower between cores located on the same chip than between cores on different chips.

  • - this is a Solaris/x86 system similar to quad, but instead of four cores it has eight. There are four pairs of cores, each pair shares an L2 cache.

  • -this is a Solaris/x86 system from AMD, running Solaris 11. It has two chips. Each chip has four cores. Each core has private L1 instruction and data caches and a private L2 cache. The L3 cache is shared among the cores on the same chip.

  • -this system is just like trocadero. It is a Solaris/x86 system from AMD, running Solaris 11. It has two chips. Each chip has four cores. Each core has private L1 instruction and data caches and a private L2 cache. The L3 cache is shared among the cores on the same chip.

Before you begin:

Make sure that you can log in to one of the servers suggested above. Also make sure that the directory containing Sun Studio compiler is in your path. Otherwise you will not be able to compile the code that you must use for this assignment.

The directory containing the compiler is:

If you do not know how to add it to your path, please find the answer on Google!

Your experimental software:

You will begin experimenting with a simple test program that runs several threads competing for locks. The program uses three lock implementations:
  • pthreads mutex -- the default Solaris implementation
  • lightweight spinlock -- the author's interpretation of the Solaris spinlock
  • load control lock -- the author's implementation based on the Johnson's algorithm we read about in class.

To become familiar with the experimental software, here is what you need to do:

  1. Download the software to one of the servers in the list above.
    The software consists of the dynamically linked library contatining the implementation of the lightweight spinlock and the load-control lock and the test program.
  2. Compile the library and the test program:
    cd LC_IMPL; make
    cd ../; make

Becoming familiar with the code:

You need to read the code, both the test application and the library implementation (in the LC_IMPL directory) to understand how it works. When you read the library code, focus only on load_control.c file. The rest of the files were borrowed and adapted from the OpenSolaris kernel to support load control. Read them if you are interested, but this is not necessary for the assignment.

To make sure you understand the code, answer the following questions about it. Answers to the questions must be submitted in your assignment report, so be sure to write them down. Note that some answers you will be able to find just by reading the code. For others, you might have to do some searching online!

  1. How do you control which lock to use -- what variable or argument?
  2. How do you control the number of threads in the test program?
  3. What are the four different ways to control the amount of lock contention in the test program?
  4. What is the default sleep interval for the load controller thread? How can you change it? 
  5. How can you control the amount of work that each thread performs between critical sections -- which variable or argument? 
  6. How does the test program obtain execution time statistics, which are printed at the end of the run?

Evaluating performance of locks:

Your next step is to run an experiment that will test the performance of the three different locks used by the test program. You need to decide on your own how you are going to design this experiment. Your goal is to understand which lock implementation performs best. You may find that different locks perform better in different conditions -- for instance, depending on what server you use, how many threads you use (fewer than the number of cores or more), and the degree of lock contention. So you will need to think how you will vary these different parameters to understand the performance of locks in different conditions.

Performance of the test program may vary from one run to another, so for each experiment you need to perform several runs and report the mean and standard deviation (or you can use another way to summarize the distribution of data).

Implementing Anderson Queue lock:

You final goal is to add the implementation of the Anderson queue lock that we studied in class. If you'd like more details about how it should be implemented, refer to this book chapter on spinlocks. (This link is password protected. I will tell you the password in class).

Feel free to refer to the implementation of the lightweight spinlock in the library provided to you, in order to implement efficient spinning in the Anderson queue lock.

Once you are done, analyze the performance of the test program under the Anderson queue lock and compare the results with your analysis of the other three locks.

Do your results lead to the same conclusion as those in the Decoupling Contention Management from Scheduling paper we read in class?

What to submit

You must prepare a well-written report of your analysis. Please spell check your report before submitting! Pick your favourite paper that we read so far, and model your report after the experimental section in that paper. Your report must not exceed 5 pages in 10-point Times New Roman font with 1-inch margins (so make your figures small and pretty -- but not too small, so they are still readable). I will deduct points if your report does not comply with the formating specifications. Do not play with line spacing or other formatting tricks to fit more text.

Include the following in the report:

  • Experimental hardware-- what server you used and key characteristics of the server.
  • Experimental design -- explain how you ran your experiment. Give enough details so another person could reproduce it, but be concise (there is no need to provide the scripts). 
  • Justification for the experiment -- explain why you set up your experiment in the way you did. What did you want to learn and why did you decide that the experimental setup that you used was the best way to answer the questions you were looking to investigate?
  • Report the results.
  • Analyize the data: does it support your original prediciton of how the system will behave? If not, explain why. If some behaviour is unexpected, do not leave it unexplained. Investigate why the results do not meet your expectations.
  • Write a well thought-out concise summary of your findings. Compare and contrast your results with those in the Decoupling Contention Management from Scheduling paper we read in class.
  • Include the appendix containing the answers to the code reading questions. The appendix does not count towards 5 pages, must have the same format as the rest of the report and must not exceed two pages.
  • The hard-copy of the final report is due in class.
© Copyright 2007 Simon Fraser University