dosek: A Dependability Oriented Static Embedded Kernel← back to Overview
dosek - Fault Avoidance Strategies
Design DecisionsThe general susceptibility of an operating system to errors and SDCs is to a high degree rooted in its basic design and implementation concepts. For instance, we could show in previous work  that, without any dependability-oriented measures, a static OSEK-like RTOS (i.e., all resources are allocated at compile time) already exhibits a five times lower number of SDCs than a more dynamic POSIX-like RTOS (i.e., all resources are allocated at run time). This inherent robustness of a static system design is the foundation of our dependability oriented kernel design.
- Design Rule 1
- Use a static (OSEK-like) operating system design.
Essentially, a transient fault can lead to an error inside the kernel only if it affects either the kernel’s control or data flow. For this, it has to hit a memory cell or register that carries currently alive kernel state, such as a global variable (always alive), a return address on the stack (alive during the execution of a system call), or a bit in the status register of the CPU (alive only immediately before a conditional instruction). Intuitively, the more long-living state a kernel maintains, the more prone it is to transient faults.
- Design Rule 2
- Minimize the time spent in system calls and the amount of volatile state, especially of global state that is alive across system calls.
However, no kernel can provide useful services without any run-time state. So, the second point to consider is the containment and, thus, detectability of data and control-flow errors by local sanity checks. Intuitively, bit-flips in pointer variables have a much higher error range than those used in arithmetic operations; hence, they are more likely to lead to SDCs. In a nutshell, any kind of indirection at run time (through data or function pointers, index registers, return addresses, and so on) impairs the inherent robustness of the resulting system.
- Design Rule 3
- Avoid indirections in the code and data flow.
Based on detailed static analysis of the global control-flow graph and an enumeration of all predicted system states, a high amout of potentially error-prone redundancy can be eliminated . With fine-grained interaction knowledge at hand, we can tailor the system calls more specifically to the application behavior in order to speed up the kernel execution paths. Instead of calling the generic system service, we insert a specialized service at the call site. This decoupling enables us to use the interaction knowledge for selecting the minimum necessary functionality at that point.
- Design Rule 4
- Exploit static system knowledge to minimize error-prone, redundant control- and data flows.
Attack surfaces of different scheduling algorithms. Left: Based on sorted linked list. Right: Direct object access, unrolled scheduling operation.