Alexandra Fedorova

Research: Multicore Systems and Parallel Computing

Overview

Computer systems are responsible for many positive benefits for the society, in areas such as health, entertainment, work efficiency, communication and the environment. But in order to make these systems truly ubiquitous, to extend their positive influence on our lives and to make these benefits available to individuals around the world, irrespective of their social situation or income, we need to overcome many problems. In particular, we must make sure that expansion of computer technology happens without producing a negative impact on the environment. The focus of my research is to make computer processors, which are notorious for high energy consumption, efficient in how they use resources -- and thus the energy -- of the entire system.

Energy Consumption in Computer Processors

Computer systems are major consumers of energy. World-wide data center power demand in 2005 was equivalent to about seventeen 1000 megawatt power plants [Koomey2008]. Data center power consumption has doubled between 2005 and 2010, and with proliferation of smart phones, whose Internet usage requires data center support and whose number is likely to reach 1.1 billion by 2013, energy consumption is likely to continue growing at a rapid speed.

Three quarters of energy consumption growth in data centers can be attributed to the growth in the number of servers [Koomey2008]. If we peek inside the server we will find that roughly 50% of energy are consumed by CPU and memory. Everyone wants a faster CPU and memory, but making them faster also makes them consume more energy.

Multicore Processors as the Solution to Microprocessor Energy Crisis

Increasing speed of processors brought about a phenomenon known as the power wall, which essentially means that in each new generation of processors power consumption grows more than the corresponding increase in performance. To address this problem, the industry transitioned to multicore processors, which have multiple computing elements on a single chip. Spreading the computation among several small and low-energy computing units is more energy efficient than executing it on a single larger and power-hungry unit.

Challenges with Computing on Multicore Processors

Although multicore processors can in theory address the power wall, realizing this potential requires overcoming many challenges.

First of all, in multicore processors computing cores share critical resources in the memory hierarchy, such as caches, memory controllers and interconnects. Contention for these resources can severely degrade performance of applications and waste the energy consumed by the system. Second, multithreaded programs running on multicore processors often communicate (share and exchange data). This cross-core communication introduces delays to program execution and puts pressure on interconnects, further contributing to contention and wasting energy.

A third challenge posed by multicore processors is the need for mainstream adoption of parallel programming. In order to fully utilize computing resources of a multicore system a program must be able to run on multiple cores, or be parallel. Parallel programming is notoriously difficult and inhibits programmer productivity. According to estimates of industry experts it takes on average three times as long to write a parallel program than to write a sequential one.

My research program aims to address these challenges, with key foci on: (1) mitigating contention for shared resources, (2) improving efficiency of asymmetric multicore systems via new scheduling algorithms, (3) designing new techniques for program parallelization.

Most significant research contributions

Managing Resource Contention in Multicore Processors

Resource contention in a serious problem in multicore processors. When threads running simultaneously compete for shared resources, such as last-level caches, memory controllers and interconnects, the system becomes less efficient. Programs may run up to three times slower because of contention. My work produced a new thread scheduling algorithm that mitigates shared resource contention.

My students and I developed an efficient online model to predict which threads will compete when scheduled to share hardware resources. Previous studies focused on modeling contention for shared caches, validating their results primarily in simulators, but we found that these models fail to predict contention on real systems. On real systems the most dominant factor is contention for memory controllers and interconnects, which earlier cache models did not capture. We found that the rate of off-chip memory requests, which can be easily measured online, predicts both how much an application will suffer from contention as well as how much it will hurt others. Based on this finding, we built a scheduling algorithm that minimizes contention for shared resources on multicore processors. Our experiments showed that this algorithm delivers performance improvements and reduces energy-delay by tens of percent relative to conventional scheduling methods, and without requiring changes to the hardware. A scheduler inspired by this work is now being implemented by Oracle's Solaris OS team, our long time collaborator.

Links to relevant publications can be found here here.

Operating System Support for Asymmetric Multicore Systems

Future multicore processors will feature cores that are able to execute the same instruction set, but have different performance, optimization features, and power profiles. The idea behind such asymmetric processors is improved efficiency via specialization - cores that have certain advanced optimization features (and thus use more area and power to implement them) can be used to run applications that really benefit from these features, while applications that do not benefit from them can run on simpler and lower-power cores. To fully tap into the potential of asymmetric systems, the operating system scheduler must schedule threads to cores in consideration of asymmetry of the hardware and runtime characteristics of threads. My group has designed three different schedulers that cater to asymmetric hardware. Prior research in this area was based largely on simulations (the majority of the algorithms were never implemented in a real operating system) and addressed limited workload scenarios. In contrast, our algorithms are implemented in Solaris and comprehensively address a wide range of workloads, including multithreaded applications. We are now working with Intel on evaluating our scheduler on their first prototype of a real asymmetric system.

A Parallel Programming Environment for Video Games

Most recently my research group got involved in design of new parallel programming methods for video games and other interactive computationally-intensive applications. Our first goal was to address limitations of a distributed acyclic graph (DAG) programming model, which is frequently used to develop these applications. Existing implementations of the DAG model disallow access to shared state by tasks that are not linked by explicit dependencies. Therefore, tasks that are not explicitly ordered according to the logic of the program, but that may access the same shared state, cannot run concurrently for fear of race conditions. As a result, programmers are forced to either use explicit synchronization, which is notoriously difficult, or serialize these tasks, which limits parallelism. Our new synchronization technique, Synchronization via Scheduling (SvS), addresses this problem in a way that does not require the programmer to think about synchronization or race conditions. SvS is a combination of new language extensions, compiler, and runtime algorithms. With SvS the programmer creates a task graph where logically independent tasks that might access shared state are not linked by explicit dependencies and do not explicitly synchronize. Our system determines whether these tasks might access the same shared state using a combination of static analysis and new online dynamic reachability algorithms. Tasks that possibly access the same shared state are then serialized by the scheduler. We demonstrated that SvS enables creating correct and scalable parallel programs without the need to "think" about shared state conflicts.

Links to relevant publications can be found here here.

Funding

  • Oracle Research Grant to address memory management on NUMA systems (2011)
  • NSERC CRD Grant with Research in Motion (2011)
  • NSERC Engage Grant with STMicroelectronics (2011)
  • British Columbia Innovation Council, NRAS Research Team Grant (2010)
  • GRAND Networks of Centers of Excellence, Project Leader: Platform Performance
  • Hardware and in-kind contributions from Intel and Electronic Arts BlackBox
  • Google Research Award (2009)
  • NSERC MITACS grant (2009)
  • NSERC Strategic Project Grant: 2009-2012 (co-PI, with Dr. Lesley Shannon) A Reconfigurable Profiling Core
  • NSERC Strategic Project Grant (2008-2010). Scheduling for Heterogeneous Multicore Systems
  • Sun Microsystems Externship Grant (2009)
  • Sun Microsystems Research Grant (2009)
  • Sun Microsystems Research Grant (2008)
  • Sun Microsystems Equipment Grant (2006)
  • Sun Microsystems Research Grant (2006)
  • NSERC Discovery Grant (2007-2012)



Click here to access the list of my publications.