Computer systems are responsible for many positive benefits for the society, in areas such as health, entertainment, work efficiency, communication and the environment. But in order to make these systems truly ubiquitous, to extend their positive influence on our lives and to make these benefits available to individuals around the world, irrespective of their social situation or income, we need to overcome many problems. In particular, we must make sure that expansion of computer technology happens without producing a negative impact on the environment. The focus of my research is to make computer processors, which are notorious for high energy consumption, efficient in how they use resources -- and thus the energy -- of the entire system.
Computer systems are major consumers of energy. World-wide data center power demand in 2005 was equivalent to about seventeen 1000 megawatt power plants [Koomey2008]. Data center power consumption has doubled between 2005 and 2010, and with proliferation of smart phones, whose Internet usage requires data center support and whose number is likely to reach 1.1 billion by 2013, energy consumption is likely to continue growing at a rapid speed.
Three quarters of energy consumption growth in data centers can be attributed to the growth in the number of servers [Koomey2008]. If we peek inside the server we will find that roughly 50% of energy are consumed by CPU and memory. Everyone wants a faster CPU and memory, but making them faster also makes them consume more energy.
Increasing speed of processors brought about a phenomenon known as the power wall, which essentially means that in each new generation of processors power consumption grows more than the corresponding increase in performance. To address this problem, the industry transitioned to multicore processors, which have multiple computing elements on a single chip. Spreading the computation among several small and low-energy computing units is more energy efficient than executing it on a single larger and power-hungry unit.
Although multicore processors can in theory address the power wall, realizing this potential requires overcoming many challenges.
First of all, in multicore processors computing cores share critical resources in the memory hierarchy, such as caches, memory controllers and interconnects. Contention for these resources can severely degrade performance of applications and waste the energy consumed by the system. Second, multithreaded programs running on multicore processors often communicate (share and exchange data). This cross-core communication introduces delays to program execution and puts pressure on interconnects, further contributing to contention and wasting energy.
A third challenge posed by multicore processors is the need for mainstream adoption of parallel programming. In order to fully utilize computing resources of a multicore system a program must be able to run on multiple cores, or be parallel. Parallel programming is notoriously difficult and inhibits programmer productivity. According to estimates of industry experts it takes on average three times as long to write a parallel program than to write a sequential one.
My research program aims to address these challenges, with key foci on: (1) mitigating contention for shared resources, (2) improving efficiency of asymmetric multicore systems via new scheduling algorithms, (3) designing new techniques for program parallelization.
Resource contention in a serious problem in multicore processors. When threads running simultaneously compete for shared resources, such as last-level caches, memory controllers and interconnects, the system becomes less efficient. Programs may run up to three times slower because of contention. My work produced a new thread scheduling algorithm that mitigates shared resource contention.
My students and I developed an efficient online model to predict which threads will compete when scheduled to share hardware resources. Previous studies focused on modeling contention for shared caches, validating their results primarily in simulators, but we found that these models fail to predict contention on real systems. On real systems the most dominant factor is contention for memory controllers and interconnects, which earlier cache models did not capture. We found that the rate of off-chip memory requests, which can be easily measured online, predicts both how much an application will suffer from contention as well as how much it will hurt others. Based on this finding, we built a scheduling algorithm that minimizes contention for shared resources on multicore processors. Our experiments showed that this algorithm delivers performance improvements and reduces energy-delay by tens of percent relative to conventional scheduling methods, and without requiring changes to the hardware. A scheduler inspired by this work is now being implemented by Oracle's Solaris OS team, our long time collaborator.
Links to relevant publications can be found here here.
Links to relevant publications can be found here here.