i4 Reading Group

The i4LG is a research reading group organized by some members of the Distributed Systems and Operating Systems Group. We meet Mondays in every second week at 12:15 in the meeting room.

Next Paper(s)

JULY 31, 2017: The Tail at Scale

J Dean, L Barroso

In: Communications of the ACM, Volume 56 Issue 2, February 2013

ABSTRACT Software techniques that tolerate latency variability are vital to building responsive large-scale Web services.

Presenting group member: Stefan Reif

Past Papers

JULY 17, 2017 Light-Weight Contexts: An OS Abstraction for Safety and Performance

James Litton, Anjo Vahldiek-Oberwagner, Eslam Elnikety, and Deepak Garg, Bobby Bhattacharjee, Peter Druschel

We introduce a new OS abstraction—light-weight contexts (lwCs)—that provides independent units of protection, privilege, and execution state within a process. A process may include several lwCs, each with possibly different views of memory, file descriptors, and access capabilities. lwCs can be used to efficiently implement roll-back (process can return to a prior recorded state), isolated address spaces (lwCs within the process may have different views of memory, e.g., isolating sensitive data from network-facing components or isolating different user sessions), and privilege separation (in-process reference monitors can arbitrate and control access).

lwCs can be implemented efficiently: the overhead of a lwC is proportional to the amount of memory exclusive to the lwC; switching lwCs is quicker than switching kernel threads within the same process. We describe the lwC abstraction and API, and an implementation of lwCs within the FreeBSD 11.0 kernel. Finally, we present an evaluation of common usage patterns, including fast rollback, session isolation, sensitive data isolation, and inprocess reference monitoring, using Apache, nginx, PHP, and OpenSSL.

Presenting group member: Christian Dietrich

JULY 3, 2017: Malacology: A Programmable Storage System

Michael A. Sevilla, Noah Watkins, Ivo Jimenez, Peter Alvaro, Shel Finkelstein, Jeff LeFevre, Carlos Maltzahn

In: Proceedings of the 12th European Conference on Computer Systems (EuroSys 2017)

ABSTRACT Storage systems need to support high-performance for special-purpose data processing applications that run on an evolving storage device technology landscape. This puts tremendous pressure on storage systems to support rapid change both in terms of their interfaces and their performance. But adapting storage systems can be difficult because unprincipled changes might jeopardize years of code-hardening and performance optimization efforts that were necessary for users to entrust their data to the storage system. We introduce the programmable storage approach, which exposes internal services and abstractions of the storage stack as building blocks for higher-level services. We also build a prototype to explore how existing abstractions of common storage system services can be leveraged to adapt to the needs of new data processing systems and the increasing variety of storage devices. We illustrate the advantages and challenges of this approach by composing existing internal abstractions into two new higher-level services: a file system metadata load balancer and a high-performance distributed shared-log. The evaluation demonstrates that our services inherit desirable qualities of the back-end storage system, including the ability to balance load, efficiently propagate service metadata, recover from failure, and navigate trade-offs between latency and throughput using leases.

Presenting group member: Michael Eischer

JUNE 19, 2017: A Self-Healing Framework for Building Resilient Cyber-Physical Systems

Denise Ratasich, Oliver Höftberger, Haris Isakovic, Muhammad Shafique, Radu Grosu

In: Proceedings of the the 19th IEEE International Symposium on Real-Time Computing (ISORC 2017)

ABSTRACT Self-healing is an increasingly popular approach to ensure resiliency, that is, a proper adaptation to failures and attacks, in cyber-physical systems (CPS). A very promising way of achieving self-healing is through structural adaptation (SHSA), by adding and removing components, or even by changing their interaction, at runtime. SHSA has to be enabled and supported by the underlying platform, in order to minimize undesired interference during components exchange and to reduce the complexity of the application components. In this paper, we discuss architectural requirements and design decisions which enable SHSA in CPS. We propose a platform that facilitates structural adaptation and demonstrate its capabilities on an example from the automotive domain: a fault-tolerant system that estimates the state-of-charge (SoC) of the battery. The SHSA support of the SoC estimator is enhanced through the existence of an ontology, capturing the interrelations among the components and using this information at runtime for reconfiguration. Finally, we demonstrate the efficiency of our SHSA framework by deploying it in a real-world CPS prototype of a rover under sensor failure.

Presenting group member: Stefan Reif

MAY 22, 2017: SGXBOUNDS: Memory Safety for Shielded Execution

Dmitrii Kuvaiskii, Oleksii Oleksenko, Sergei Arnautov, Bohdan Trach, Pramod Bhatotia, Pascal Felber, and Christof Fetzer

In: Proceedings of the 12th European Conference on Computer Systems (EuroSys 2017)

ABSTRACT Shielded execution based on Intel SGX provides strong security guarantees for legacy applications running on untrusted platforms. However, memory safety attacks such as Heartbleed can render the confidentiality and integrity properties of shielded execution completely ineffective. To prevent these attacks, the state-of-the-art memory-safety approaches can be used in the context of shielded execution. In this work, we first showcase that two prominent software- and hardware-based defenses, AddressSanitizer and Intel MPX respectively, are impractical for shielded execution due to high performance and memory overheads. This motivated our design of SGXBounds---an efficient memory-safety approach for shielded execution exploiting the architectural features of Intel SGX. Our design is based on a simple combination of tagged pointers and compact memory layout. We implemented SGXBounds based on the LLVM compiler framework targeting unmodified multithreaded applications. Our evaluation using Phoenix, PARSEC, and RIPE benchmark suites shows that SGXBounds has performance and memory overheads of 17% and 0.1% respectively, while providing security guarantees similar to AddressSanitizer and Intel MPX. We have obtained similar results with SPEC CPU2006 and four real-world case studies: SQLite, Memcached, Apache, and Nginx.

Presenting group members: Tobias Distler

MAY 8, 2017: Reliable and precise WCET determination for a real-life processor

Marc Langenbach, Florian Martin, Michael Schmidt, Henrik Theiling, Stephan Thesing, and Reinhard Wilhelm.

In: International Workshop on Embedded Software (2001)

ABSTRACT The USES-group at the Universität des Saarlandes follows an approach to compute reliable run-time guarantees which is both wellbased on theoretical foundations and practical from a software engineering and an effciency point of view. Several aspects are essential to the USES approach: the resulting system is modular by structuring the task into a sequence of subtasks, which are tackled with appropriate methods. Generic and generative methods are used whenever possible. These principles lead to an understandable, maintainable, effcient, and provably correct system. This paper gives an overview of the methods used in the USES approach to WCET determination. A fully functional prototype system for the Motorola ColdFire MCF 5307 processor is presented, the implications of processor design on the predictability of behavior described, and experiences with analyzing applications running on this processor reported.

Presenting group member: Christian Eichler

MARCH 27, 2017: Mencius: Building Efficient Replicated State Machines for WANs

Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo.

In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation

ABSTRACT We present a protocol for general state machine replication – a method that provides strong consistency – that has high performance in a wide-area network. In particular, our protocol Mencius has high throughput under high client load and low latency under low client load even under changing wide-area network environment and client load. We develop our protocol as a derivation from the well-known protocol Paxos. Such a development can be changed or further refined to take advantage of specific network or application requirements.

Presenting group member: Tobias Distler

MARCH 13, 2017: The Multikernel: A new OS architecture for scalable multicore systems

Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania.

In: Proceedings of the 22nd ACM Symposium on OS Principles

ABSTRACT Commodity computer systems contain more and more processor cores and exhibit increasingly diverse architectural tradeoffs, including memory hierarchies, interconnects, instruction sets and variants, and IO configurations. Previous high-performance computing systems have scaled in specific cases, but the dynamic nature of modern client and server workloads, coupled with the impossibility of statically optimizing an OS for all workloads and hardware variants pose serious challenges for operating system structures. We argue that the challenge of future multicore hardware is best met by embracing the networked nature of the machine, rethinking OS architecture using ideas from distributed systems. We investigate a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing. We have implemented a multikernel OS to show that the approach is promising, and we describe how traditional scalability problems for operating systems (such as memory management) can be effectively recast using messages and can exploit insights from distributed systems and networking. An evaluation of our prototype on multicore systems shows that, even on present-day machines, the performance of a multikernel is comparable with a conventional OS, and can scale better to support future hardware.

FEBRUARY 27, 2017: Corey: An Operating System for Many Cores

Boyd-Wickizer, Silas and Chen, Haibo and Chen, Rong and Mao, Yandong and Kaashoek, Frans and Morris, Robert and Pesterev, Aleksey and Stein, Lex and Wu, Ming and Dai, Yuehua and Zhang, Yang and Zhang, Zheng

In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation

ABSTRACT Multiprocessor application performance can be limited by the operating system when the application uses the operating system frequently and the operating system services use data structures shared and modified by multiple processing cores. If the application does not need the sharing, then the operating system will become an unnecessary bottleneck to the application's performance.

This paper argues that applications should control sharing: the kernel should arrange each data structure so that only a single processor need update it, unless directed otherwise by the application. Guided by this design principle, this paper proposes three operating system abstractions (address ranges, kernel cores, and shares) that allow applications to control inter-core sharing and to take advantage of the likely abundance of cores by dedicating cores to specific operating system functions.

Measurements of microbenchmarks on the Corey prototype operating system, which embodies the new abstractions, show how control over sharing can improve performance. Application benchmarks, using MapReduce and a Web server, show that the improvements can be significant for overall performance: MapReduce on Corey performs 25% faster than on Linux when using 16 cores. Hardware event counters confirm that these improvements are due to avoiding operations that are expensive on multicore machines.

Presenting member: Daniel Lohmann

FEBRUARY 13, 2017: The Implementation of the Cilk-5 Multithreaded Language

Matteo Frigo, Charles E. Leiserson, Keith H. Randall

In: Proceedings of the SIGPLAN Conference on Program Language Design and Implementation, 1998

ABSTRACT The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "work-first" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-first principle was exploited in the design of Cilk-5's compiler and its runtime system. In particular, we present Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler.

Presenting group member: Christian Eichler

JANUARY 30, 2017: Practical Byzantine Fault Tolerance

Miguel Castro and Barbara Liskov

In: Proceedings of the Third Symposium on Operating Systems Design and Implementation, 1999

ABSTRACT This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantine-fault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior . Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3% slower than a standard unreplicated NFS.

Presenting group member: Michael Eischer

JANUARY 16, 2017: Structured Deferral: Synchronization via Procrastination

Paul E. McKenney

In: ACM Queue 56 (2013)

ABSTRACT We simply do not have a synchronization mechanism that can enforce mutual exclusion.

Presenting group member: Stefan Reif

NOVEMBER 21, 2016: On the Design and Development of Program Families

David Lorge Parnas.

In: IEEE Transactions on Software Engineering, SE-2:1 (1976)

ABSTRACT: Program families are defined (analogously to hardware families) as sets of programs whose common properties are so extensive that it is advantageous to study the common properties of the programs before analyzing individual members. The assumption that, if one is to develop a set of similar programs over a period of time, one should consider the set as a whole while developing the first three approaches to the development, is discussed. A conventional approach called "sequential development" is compared to "stepwise refinement" and "specification of information hiding modules." A more detailed comparison of the two methods is then made. By means of several examples it is demonstrated that the two methods are based on the same concepts but bring complementary advantages.

Presenting group member: Timo Hönig

DECEMBER 5, 2016: In Search of an Understandable Consensus Algorithm

Diego Ongaro and John Ousterhout.

In: Proceedings of the 2014 USENIX Annual Technical Conference

ABSTRACT: Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Results from a user study demonstrate that Raft is easier for students to learn than Paxos. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety.

Presenting group member: Tobias Distler

NOVEMBER 7, 2016: Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine

In: ASPLOS '15 Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT: We propose a new memory-safe interpretation of the C abstract machine that provides stronger protection to benefit security and debugging. Despite ambiguities in the specification intended to provide implementation flexibility, contemporary implementations of C have converged on a memory model similar to the PDP-11, the original target for C. This model lacks support for memory safety despite well-documented impacts on security and reliability. Attempts to change this model are often hampered by assumptions embedded in a large body of existing C code, dating back to the memory model exposed by the original C compiler for the PDP-11. Our experience with attempting to implement a memory-safe variant of C on the CHERI experimental microprocessor led us to identify a number of problematic idioms. We describe these as well as their interaction with existing memory safety schemes and the assumptions that they make beyond the requirements of the C specification. Finally, we refine the CHERI ISA and abstract model for C, by combining elements of the CHERI capability model and fat pointers, and present a softcore CPU that implements a C abstract machine that can run legacy C code with strong memory protection guarantees.

ADDITIONAL MATERIAL: Slides

Presenting group member: Florian Schmaus

JULY 1, 2016: How not to lie with statistics: the correct way to summarize benchmark results

In: Communications of the ACM, US (1986)

ABSTRACT: In the literature, performance results are frequently summarized using the arithmetic mean of performance ratios, leading, in some cases, to wrong conclusions or, at best, inappropriate statistics. We hope to elucidate this inadvertent misuse of statistics in reporting results by pointing out why the arithmetic mean should not be used to summarize normalized performance numbers, and showing why the geometric mean is the more ap- propriate measure. We do this in the form of some simple rules for improved statistical analysis of performance benchmark results.

ADDITIONAL MATERIAL: Benchmarking Crimes by Gernoth Heiser; Slides

Presenting group member: Peter Wägemann

JUNE 17, 2016: The camel has two humps

Saeed Dehnadi, Richard Bornart

In: Middlesex University, UK (2006)

ABSTRACT: Learning to program is notoriously difficult. A substantial minority of students fails in every introductory programming course in every UK university. Despite heroic academic effort, the proportion has increased rather than decreased over the years. Despite a great deal of research into teaching methods and student responses, we have no idea of the cause. It has long been suspected that some people have a natural aptitude for programming, but until now there has been no psychological test which could detect it. Programming ability is not known to be correlated with age, with sex, or with educational attainment; nor has it been found to be correlated with any of the aptitudes measured in conventional ‘intelligence’ or ‘problem-solving-ability’ tests. We have found a test for programming aptitude, of which we give details. We can predict success or failure even before students have had any contact with any programming language with very high accuracy, and by testing with the same instrument after a few weeks of exposure, with extreme accuracy. We present experimental evidence to support our claim. We point out that programming teaching is useless for those who are bound to fail and pointless for those who are certain to succeed. (Presenting group member: Stefan Reif)

MAY 27, 2016: An Adaptive Framework for Multiprocessor Real-Time Systems

Aaron Block, Björn Brandenburg, James H. Anderson, Stephen Quint

In: 2008 Euromicro Conference on Real-Time Systems

ABSTRACT: In this paper, we develop an adaptive scheduling framework for changing the processor shares of tasks - a process called reweighting - on real-time multiprocessor platforms. Our particular focus is adaptive frameworks that are deployed in environments in which tasks may frequently require significant share changes. Prior work on enabling real-time adaptivity on multiprocessors has focused exclusively on scheduling algorithms that can enact needed adaptations. The algorithm proposed in this paper uses both feedback and optimization techniques to determine at runtime which adaptations are needed. (Presenting group member: Peter Ulbrich)

MAY 13, 2016: Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, and Leonidas Rigas.

In: Proceedings of the 23rd Symposium on Operating Systems Principles, 2011

ABSTRACT: Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time . WAS customers have access to their data from anywhere at any time and only pay for what they use and store. In WAS, data is stored durably using both local and geographic replication to facilitate disaster recovery . Currently, WAS storage comes in the form of Blobs (files), Tables (structured stor age), and Queues (message delivery). In this paper, we desc ribe the WAS architecture, global namespace, and data model, as we ll as its resource provisioning, load balancing, and replication systems. (Presenting group member: Tobias Distler, Slides)

APR 29, 2016: Avoiding Pitfalls in Fault-Injection Based Comparision of Program Susceptibility to Soft Errors

Horst Schirmeier, Christoph Borchert, Olaf Spinczyk

In: DSN '15 Proceedings of the 45th IEEE/IFIP International Conference on Dependable Systems and Networks, 2015.

ABSTRACT: Since the first identification of physical causes for soft errors in memory circuits, fault injection (FI) has grown into a standard methodology to assess the fault resilience of computer systems. A variety of FI techniques trying to mimic these physical causes has been developed to measure and compare program susceptibility to soft errors. In this paper, we analyze the process of evaluating programs, which are hardened by software-based hardware fault-tolerance mechanisms, under a uniformly distributed soft-error model. We identify three pitfalls in FI result interpretation widespread in the literature, even published in renowned conference proceedings. Using a simple machine model and transient single-bit faults in memory, we find counterexamples that reveal the unfitness of common practices in the field, and substantiate our findings with real-world examples. In particular, we demonstrate that the fault coverage metric must be abolished for comparing programs. Instead, we propose to use extrapolated absolute failure counts as a valid comparison metric. (Presenting group member: Daniel Lohmann)

APR 15, 2016: Thirty Years Later: Lessons from the Multics Security Evaluation

Paul A.Kruger, Roger R. Scheel

In: ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference. 2002.

ABSTRACT: Almost thirty years ago a vulnerability assessment of Multics identified significant vulnerabilities, despite thefact that Multics was more secure than other contemporary(and current) computer systems. Considerably moreimportant than any of the individual design and implementationflaws was the demonstration of subversion ofthe protection mechanism using malicious software (e.g., trap doors and Trojan horses). A series of enhancementswere suggested that enabled Multics to serve in a relatively benign environment. These included addition of"Mandatory Access Controls" and these enhancementswere greatly enabled by the fact the Multics was designed from the start for security. However, the bottom-line conclusionwas that "restructuring is essential" around averifiable "security kernel" before using Multics (or anyother system) in an open environment (as in today'sInternet) with the existence of well-motivated professional attackers employing subversion. The lessons learned from the vulnerability assessment are highly applicabletoday as governments and industry strive (unsuccessfully)to "secure" today's weaker operating systems throughadd-ons, "hardening", and intrusion detection schemes. (Presenting group member: Christian Dietrich)

APR 1, 2016: "The love/hate relationship with the C preprocessor: An interview study"

Flávio Medeiros, Christian Kästner, Márcio Ribeiro, Sarah Nadi, and Rohit Gheyi.

In: LIPIcs-Leibniz International Proceedings in Informatics. Vol. 37. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. 2015.

ABSTRACT: The C preprocessor has received strong criticism in academia, among others regarding separation of concerns, error proneness, and code obfuscation, but is widely used in practice. Many (mostly academic) alternatives to the preprocessor exist, but have not been adopted in practice. Since developers continue to use the preprocessor despite all criticism and research, we ask how practitioners perceive the C preprocessor. We performed interviews with 40 developers, used grounded theory to analyze the data, and cross-validated the results with data from a survey among 202 developers, repository mining, and results from previous studies. In particular, we investigated four research questions related to why the preprocessor is still widely used in practice, common problems, alternatives, and the impact of undisciplined annotations. Our study shows that developers are aware of the criticism the C preprocessor receives, but use it nonetheless, mainly for portability and variability. Many developers indicate that they regularly face preprocessor-related problems and preprocessor-related bugs. The majority of our interviewees do not see any current C-native technologies that can entirely replace the C preprocessor. However, developers tend to mitigate problems with guidelines, even though those guidelines are not enforced consistently. We report the key insights gained from our study and discuss implications for practitioners and researchers on how to better use the C preprocessor to minimize its negative impact. (Presenting group member: Valentin Rothberg, Slides)

MAR 18, 2016: "End-to-end arguments in system design"

Saltzer, Jerome H., David P. Reed, and David D. Clark.

In: ACM Transactions on Computer Systems (TOCS) 2.4 (1984).

ABSTRACT: This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the end-to-end argument, suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. Examples discussed in the paper include bit-error recovery, security using encryption, duplicate message suppression, recovery from system crashes, and delivery acknowledgment. Low-level mechanisms to support these functions are justified only as performance enhancements. (Presenting group member: Timo Hönig, Slides)