Process Cruise Control
Scalability of the core frequency is a common feature of low-power processor
architectures. Many heuristics for frequency scaling were proposed in the past
to find the best trade-off between energy efficiency and computational
performance. With complex applications exhibiting unpredictable behavior these
heuristics cannot reliably adjust the operation point of the hardware because
they do not know where the energy is spent and why the performance is lost.
Embedded hardware monitors in the form of event counters have proven to offer
valuable information in the field of performance analysis. We will demonstrate
that counter values can also characterize the power-specific characteristics
of a thread.
W propose an energy-aware scheduling policy that benefits from
event counters. By exploiting the information from these counters, the
scheduler determines the appropriate clock frequency for each individual
thread running in a time-sharing environment. A recurrent analysis of the
thread-specific energy and performance profile allows an adjustment of the
frequency to the behavioral changes of the application. While the clock
frequency may vary in a wide range, the application performance should only
suffer slightly (e.g. with 10% performance loss compared to the execution at
the highest clock speed). Because of the similarity to a car cruise control,
we called our scheduling policy Process Cruise Control. This adaptive clock
scaling is accomplished by the operating system without any application
support.
Process Cruise Control has been implemented on the Intel XScale architecture,
that offers a variety of frequencies and a set of configurable event counters.
Energy measurements of the target architecture under variable load show the
advantage of the proposed approach.
- Process Cruise Control-Event-Driven Clock Scaling for Dynamic Power Management
- Andreas Weißel, Frank Bellosa
- Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES 2002), Grenoble, France,
- October 2002
- [Abstract] [Full Paper (pdf), 144 kB]
Memory Compression
Optimal Processor Speed for Low Power Memory Compression
Using event-driven clock scaling, we want to compare the speed and energy
consumption of algorithms for memory compression running under different
processor clock frequencies. Dependent on the amount of free memory,
compression can be performed at reduced clock speeds to save energy. Under
critical memory conditions and for decompression the processor is set to run
at maximum clock speed.
Algorithms differing in compression speed and energy characteristics will be
studied: e.g. the well-known Ziv-Lempel compressors or the WK algorithms
introduced in [2], which are optimized for in-memory data pages.
Static Power Consumption of the Memory vs. CPU Power Consumptions for Compression
RAM modules contribute to the static power consumption of the whole system.
Would it be possible to save energy if we substituted part of the systems
physical RAM by a virtual memory module of approximately the same size using
memory compression? The additional energy consumption by the processor when
compressing/decompressing memory pages must be taken into consideration.
In contrast to the previouse presented in [2] and [3] we don't use memory
compression to avoid slow disk paging: our objective lies in saving energy by
replacing static power consumption of memory modules by the energy necessary
for additional computing.
Hibernation of Memory Banks
State-of-the-art RAM devices support various low-power modes. Saving a
significant amount of energy can be achieved by setting as many memory chips
as possible into sleep state (see [4]). If more memory has to be allocated,
the system can choose between two options, both inducing short delays:
additional banks can be made active or part of the memory in the already
active banks is compressed. The figure below shows a system with 4 memory
banks under increasing memory usage. As a first step, the page allocation
should cluster an application's pages into the active banks (a). As memory
usage increases, part of the active banks' memory is reserved for compressed
storage and less frequently used pages are moved to the compressed area (b).
With still increasing memory usage more and more chips are made active resp.
used for storing compressed pages (c). Memory banks are ordered corresponding
to their latency (with sleeping banks showing the highest) to enable optimal
page allocation.
A loss in system performance is unavoidable due to the trade-off between fast
and slow (compressed) memory regions and resynchronization delays when
activating memory banks. The memory management system has to minimize this
loss while maximizing energy savings.
[1] Frank Bellosa. Process Cruise Control: Event-Driven Clock Scaling for
Dynamic Power Management
[2] Paul R. Wilson, Scott F. Kaplan, and Yannis Smaragdakis. The Case for
Compressed Caching in Virtual Memory Systems. In Proceedings of the USENIX
Technical Conference (June 1999), USENIX.
[3] Michael J. Freedman. The Compression Cache: Virtual Memory Compression for
Handheld Computers, March 2000
[4] Alvin R. Lebeck, Xiaobo Fan, Heng Zeng, Carla Ellis. Power Aware Page
Allocation. In Proceedings of the 9th International Conference on
Architectural Support for Programming Languages and Operating Systems (ASPLOS
IX), November 2000.