INTRODUCTION TO CHIP MULTITHREADING (CMT)

WEEK 1 January 8

TU

LECTURE: Course introduction and overview

TH

Theme: History and Overview of Chip Multithreading

1-A. Operating System Scheduling for Chip Multithreaded Processors [10] (Section I.1)

1-B. Computer Architecture Book, [15](pages 172 – 181, up to “What limits multiple-issue processors”)

1-C. Computer Architecture Book, [15](pages 249-257 “ Sun T1 multiprocessor”)

WEEK 2 January 15

TU

Theme: Chip Multithreading = Chip Multiprocessing + Hardware Multithreading

2-A. SMT: A Platform for Next-Generation Processors [8]

2-B. Interleaving [21](Skip Section 6)

2-C. Single-chip multiprocessor [14]

TH

Theme: Modern CMT processors

2-D. Niagara [19]

2-E. Hyper-threaded Pentium [22]

OVERVIEW OF CMT SYSTEMS RESEARCH

WEEK 3 January 22

TU

Theme: CMT performance analysis

3-A. Initial Observations of the SMT Pentium 4 [37]

Theme: Scheduling for CMT systems

3-B. Symbiotic Jobscheduling [32]

TH

Theme: Runtime support

3-C. Adaptive OpenMP Loop Scheduler [38]

Theme: Dynamic Resource Partitioning in Hardware

3-D. Fair Cache Sharing and Partitioning [18]

WEEK 4 January 29

TU

Theme: Performance Modeling

4-A. Methods for Modeling Resource Contention on SMT [25]

Theme: OS-hardware Interaction

4-B. Architectural Support for OS Cache Management [29]

TH

LECTURE: Tools and Techniques for CMT Systems Research

SOFTWARE SCHEDULING ALGORITHMS

WEEK 5 February 5

TU

Theme: Sharing-Based Scheduling

5-A. Sharing-Based Thread Placement [36] (Justin)

5-B. Pool-Based Scheduling on NUMA systems [2] (Fernando)

Project consultations

TH

Theme: Compiler and Runtime Algorithms

5-C. Adaptive Execution Techniques [17] (Mike)

Project consultations

WEEK 6 February 12

TU

Theme: Load-Balancing Schedulers in Commercial OS

6-A. Linux Scheduler #1 [26] (Justin)

6-B. Linux Scheduler #2 [31] (Dan)

6-C. Optimizations in Solaris [23], pp. 795-814 (Dan)

Project consultations

TH

Theme: Performance Predictability and Fairness

6-D. Fair Scheduler [12]

PROJECT ABSTRACTS DUE ON FEBRUARY 15

RESOURCE PARTITIONING ALGORITHMS

WEEK 7 February 19

TU

Theme: Performance Optimization

7-A. Cooperative Caching [5] (Fernando)

7-B. Communist, Utilitarian and Capitalist Cache Policies [16] (Navid)

TH

Quiz #1

WEEK 8 February 26

TU

Theme: Performance Predictability

8-A. Applications of Thread Prioritization [28] (Navid)

8-B. Predictable Performance in SMT Processors [3] (Mike)

TH

Theme: Fairness

8-C. Predicting Inter-Thread Contention [4] (Hossein)

WEEK 9 March 5

TU

Theme: More on performance

9-A. Utility-based Cache Partitioning [27] (Navid)

Project Progress Reports

TH

Theme: Fast Single-Thread Execution

9-B. Transparent Threads [7] (Justin)

ARCHITECTURAL SUPPORT FOR SOFTWARE OPTIMIZATIONS

WEEK 10 March 12

TU

Theme: Architectural Support for OS-level Optimizations

10-A. Architectural Support for Scheduling [30] (Hossein)

10-B. Memory-Monitoring Scheme for Memory-Aware Scheduling [34] (Mike)

TH

10-C. Helper Threads on SMT [13] (Sven)

WEEK 11 March 19

TU

11-A. Evaluating Performance of Hardware Thread Priorities on SMT [24] (Sven)

11-B. Compatible Phase Scheduling [9] (Dan)

TRENDS IN APPLICATION DEVELOPMENT

WEEK 11 March 19

TH

Theme: Trends in Applications

11-C. Multicore Chips Make Application Development Tough [1]

11-D. The Free Lunch is Over [35]

11-E. Hybrid Transactional Memory [6] (Fernando)

THE BIG PICTURE

WEEK 12 March 26

TU

12-A. Performance/Watt Ratio [20] (Hossein)

12-B. Chip Multithreading: Opportunities and Challenges [33] (Sven)

TH

Quiz #2

WEEK 13 April 2

TU

Project presentations

TH

Project presentations

 

April 10:          FINAL PROJECTS DUE

 

 

BIBLIOGRAPHY

 

 

  [1]   Gary Anthes. Hard Cores: Multicore chips provide power but make app development tough. http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=112303, 2006

  [2]   Timothy Brecht. An Experimental Evaluation of Processor Pool-Based Scheduling for Shared-Memory NUMA Multiprocessors. In Proceedings of the 3rd Workshop on Job Scheduling Strategies for Parallel Processing, 1997

  [3]   F. J. Cazorla, Peter M. W. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero. Predictable Performance in SMT Processors. In Proceedings of the 1st Conference on Computing Frontiers, 2004

  [4]   D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Multi-Processor Architecture. In Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2005

  [5]   J. Chang and G. S. Sohi. Cooperative Caching for Chip Multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, 2006

  [6]   Peter Damron, Alexandra Fedorova, Yosef Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. Hybrid Transactional Memory. In Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

  [7]   G. Dorai and D. Yeung. Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance. In Proceedings of the 11th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2002

  [8]   Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca M. Stamm, and Dean M. Tullsen. Simultaneous Multithreading: A Platform for Next-Generation Processors. In Proceedings of the IEEE Micro, September 1997

  [9]   A. El-Moursy, R. Garg, David Albonesi, and Sandhya Dwarkadas. Compatible phase co-scheduling on a CMP of multi-threaded processors. In Proceedings of the 20th International Parallel and Distributed Processing Symposium, 2006

[10]   Alexandra Fedorova. Operating System Scheduling for Chip Multithreaded Processors.  2006

[11]   Alexandra Fedorova. System Software Design For Chip Multithreaded Processors. Submitted for review. Not for wide distribution., 2006

[12]   Alexandra Fedorova, Margo Seltzer, and Michael D. Smith. A Cache-Fair Operation System Scheduler for Chip Multiprocessors. In preparation for conference submission. Not for wide distribution., 2007

[13]   I. Ganusov and M. Burtscher. Efficient Emulation of Hardware Prefetchers via Event-Driven Helper Threading. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, 2006

[14]   L. Hammond, B Nayfeh, and K. Olukotun. A Single-Chip Multiprocessor. Computer, 3(9):79-85, 1997

[15]   J. Hennessy and David A. Patterson. Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufman, 2006

[16]   L. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, 2006

[17]   C. Jung, D. Lim, J. Lee, and S. Han. Adaptive Execution Techniques for SMT Multiprocessor Architectures. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

[18]   S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2004

[19]   Poonacha Kongetira. A 32-way Multithreaded SPARC(R) Processor. In Proceedings of the 16th Symposium On High Performance Chips (HOTCHIPS), 2004

[20]   James Laudon. Performance/Watt: the New Server Focus. ACM SIGARCH Computer Architecture News, 33(4):5-13, 2005

[21]   James Laudon, A. Gupta, and Mark Horowitz. Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations. In Proceedings of the Sixth International Conference On Architectural Support For Programming Languages And Operating Systems (ASPLOS), 1994

[22]   Deborah T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A. Koufaty, J. Allan Miller, and Michael Upton. Hyper-threading Technology Architecture and Microarchitecture. Intel Technical Journal, 6(1):4-15, 2002

[23]   Richard McDougall and Jim Mauro. Solaris™ Internals: Solaris 10 and OpenSolaris Kernel Architecture.  Vol. 2nd. 2006

[24]   M. Meswani and P. Teller. Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processes using SPEC CPU2000 . In Proceedings of the 2nd Workshop on Operating System Interference In High Performance Applications, 2006

[25]   Tipp Moseley, Joshua L. Kihm, Daniel A. Connors, and Dirk Grunwald. Methods for Modeling Resource Contention on Simultaneous Multithreading Processors. In Proceedings of the International Conference on Computer Design, 2005

[26]   Jun Nakajima and Venkatesh Pallipadi. Enhancements for Hyper-Threading Technology in the Operating System - Seeking the Optimal Scheduling. In Proceedings of the Second Workshop on Industrial Experiences with Systems and Software, 2002

[27]   M. K. Qureshi and Yale Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th International Symposium on Microarchitecture, 2006

[28]   S. E. Raasch and S. K. Reinhardt. Applications of Thread Prioritization in SMT Processors. In Proceedings of the Workshop On Multi-Threaded Execution, Architecture and Compilation, 1999

[29]   N. Rafique, W. T. Lim, and M. Thottethodi. Architectural Support for Operating System-driven CMP Cache Management. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, 2006

[30]   Alex Settle, Joshua L. Kihm, Andrew Janiszewski, and Daniel A. Connors. Architectural Support for Enhanced SMT Job Scheduling. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2004

[31]   Suresh Siddha and Venkatesh Pallipadi. Chip Multi Processing Aware Linux Kernel Scheduler. In Proceedings of the Linux Symposium, 2005

[32]   Allan Snavely and Dean M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000

[33]   Lawrence Spracklen and Santosh G. Abraham. Chip Multithreading: Opportunities and Challenges. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005

[34]   G. E. Suh, S Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In Proceedings of the 8th International Symposium on High Performance Computer Architecture, 2002

[35]   Herb Sutter. The Free Lunch is Over: A Fundamental Turn Towards Concurrency in Software. Dr.Dobbs Journal, 30(3, 2005

[36]   Radhika Thekkath and Susan J. Eggers. Impact of Sharing-Based Thread Placement on Multithreaded Architectures. In Proceedings of the 22nd Annual International Symposium On Computer Architecture (ISCA), April 2004

[37]   Nathan Tuck and Dean M. Tullsen. Initial Observations of the Simultaneous Multithreading Pentium 4. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, 2003

[38]   Y. Zhang, M. Burcea, V. Cheng, R. Ho, and M. Voss. An Adaptive OpenMP Loop Scheduler for Hyperthreaded SMPs. In Proceedings of the International Conference on Parallel and Distributed Systems, 2004