dosek: A Dependability Oriented Static Embedded Kernel← back to Overview
dosek - Fault Detection Strategies
Design Decisions (cont'd)Hardware-based isolation mechanisms are widely used and a proven dependability measure. Watchdog concepts and MPUs play an important role for safety-related aspects. To a large extent, transient faults can be contained within individual components and reveal a significant amount of possible SDCs, as also shown in previous evaluations . Consequently, dOSEK integrates the underlying architecture’s mechanisms into its system design, leveraging a coarse-grained fault detection among tasks and the kernel. With our completely generative approach, all necessary MPU configurations can be derived already at compile time and placed in robust ROM. Besides the obvious separation of tasks from each other, dOSEK especially uses the MPU to protect tasks from faults inside the kernel (Figure 1, step b).
- Design Rule 5
- Separate OS from Applications - regarding both directions.
After minimizing the attack surface by eliminating redundant code and system state, and coarse-grained fault containment using hardware-based Isolation, remaining single points of failure are handled by fine-grained software-based measures. Here, dOSEK employs different algorithms to cover both the control- as well as the data flow of the operating system. Additionally, the predicted system state, determined by the global control-flow analysis, can be integrated as further fault detecting redundancy in terms of state assertions .
- Design Rule 6
- Specific Hardening of unavoidable, remaining error-prone attack surface.
Finally, an important lesson we learned is that the effect of dependability-oriented measure on the actual implementation is difficult to assess in advance. In many cases, additional measures turned out to do more harm than good: For instance, extensive CRC-based consistency checks on the kernel state can help to detect errors early. However, the overhead in time and space also increases the amount of alive kernel state (i.e., the “attack surface” for transient faults), so that the number of SDCs can actually increase. Implementation glitches that manifest only on compiler- or ISA-level can easily lead to loopholes for transient faults in algorithms that we considered as robust [3,4].