Large caches used in scalable shared-memory architectures can avoid high memory access time only if data is referenced within the address scope of the cache. Consequently, locality is the key issue in multiprocessor performance. While CPU utilization still determines scheduling decisions of contemporary schedulers, we propose novel scheduling policies based on locality information derived from cache miss counters. A locality-conscious scheduler can reduce the costs for reloading the cache after each context switch. Thus, the potential benefit of using locality information increases with the frequency of scheduling decisions.
Lightweight threads have become a common abstraction in the field of programming languages and operating systems. User-level schedulers make frequent context switches affordable and therefore draw most profit from the usage of locality information if the lifetime of cachelines exceeds scheduling cycles. This paper examines the performance implications of locality information usage in thread scheduling algorithms for scalable shared-memory multiprocessors. A prototype implementation shows that a locality-conscious scheduler outperforms ap proaches ignoring locality information