Friedrich-Alexander-Universität Erlangen-Nürnberg  /   Technische Fakultät  /   Department Informatik

dosek: A Dependability Oriented Static Embedded Kernel

← back to Overview

dosek - Fault Detection Strategies

Design Decisions (cont'd)

Hardware-based isolation mechanisms are widely used and a proven dependability measure. Watchdog concepts and MPUs play an important role for safety-related aspects. To a large extent, transient faults can be contained within individual components and reveal a significant amount of possible SDCs, as also shown in previous evaluations [1]. Consequently, dOSEK integrates the underlying architecture’s mechanisms into its system design, leveraging a coarse-grained fault detection among tasks and the kernel. With our completely generative approach, all necessary MPU configurations can be derived already at compile time and placed in robust ROM. Besides the obvious separation of tasks from each other, dOSEK especially uses the MPU to protect tasks from faults inside the kernel (Figure 1, step b).
Design Rule 5
Separate OS from Applications - regarding both directions.

After minimizing the attack surface by eliminating redundant code and system state, and coarse-grained fault containment using hardware-based Isolation, remaining single points of failure are handled by fine-grained software-based measures. Here, dOSEK employs different algorithms to cover both the control- as well as the data flow of the operating system. Additionally, the predicted system state, determined by the global control-flow analysis, can be integrated as further fault detecting redundancy in terms of state assertions [2].
Design Rule 6
Specific Hardening of unavoidable, remaining error-prone attack surface.

Finally, an important lesson we learned is that the effect of dependability-oriented measure on the actual implementation is difficult to assess in advance. In many cases, additional measures turned out to do more harm than good: For instance, extensive CRC-based consistency checks on the kernel state can help to detect errors early. However, the overhead in time and space also increases the amount of alive kernel state (i.e., the “attack surface” for transient faults), so that the number of SDCs can actually increase. Implementation glitches that manifest only on compiler- or ISA-level can easily lead to loopholes for transient faults in algorithms that we considered as robust [3,4].

Overview of the OS data kept in RAM of an example system composed of three tasks and two alarms. Each box represents a 32 bit memory location. All kernel data are hardened using an ANB-Code. The remaining application- and architecture-specific values are safeguarded by double modular redundancy (DMR) or parity bits.

References

[1]

Hoffmann, Martin ; Borchert, Christoph ; Dietrich, Christian ; Schirmeier, Horst ; Kapitza, Rüdiger ; Spinczyk, Olaf ; Lohmann, Daniel:
Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs.
In: IEEE Computer Society (Ed.) : Proceedings of the 17th IEEE International Symposium on Object/Component/Service-oriented Real-time Distributed Computing (ISORC '14)
(IEEE International Symposium on Object/Component/Service-oriented Real-time Distributed Computing, Reno, NV, USA, June 2014).
2014, pp 230-237.
Keywords: DanceOS, dosek, osek, dependability, static system
[doi>10.1109/ISORC.2014.26] (BibTeX)

[2]

Dietrich, Christian ; Hoffmann, Martin ; Lohmann, Daniel:
Cross-Kernel Control-Flow-Graph Analysis for Event-Driven Real-Time Systems.
In: ACM (Ed.) : Proceedings of the 16th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
(The 16th Conference on Languages, Compilers and Tools for Embedded Systems (LCTES 2015), Portland, Oregon, USA, June 2015).
New York, NY, USA : ACM Press, 2015, pp 1-10.
Keywords: Static Analysis; Control-Flow Graph; Cross-Kernel Analysis; Real-Time Systems; Optimization; Compiler
[doi>10.1145/2670529.2754963] (BibTeX)

[3]

Hoffmann, Martin ; Ulbrich, Peter ; Dietrich, Christian ; Schirmeier, Horst ; Lohmann, Daniel ; Schröder-Preikschat, Wolfgang:
A Practitioner's Guide to Software-based Soft-Error Mitigation Using AN-Codes.
In: IEEE Computer Society (Ed.) : Proceedings of the 15th IEEE International Symposium on High Assurance Systems Engineering (HASE '14)
(Symposium on High Assurance Systems Engineering, Miami, FL, USA, Januar 2014).
2014, pp 33-40. - ISBN 978-1-4799-3465-2
Keywords: DanceOS; CoRed; Operating Systems; Embedded Systems; Real-Time Systems; Dependability; Safety; Coded Processing; ARES; ESI
[doi>10.1109/HASE.2014.14] (BibTeX)

[4]