Sloth: A Minimal-Effort Kernel for Embedded Systems

Sloth: Threads as Interrupts

The core idea behind the Sloth system is to implement all threads as interrupt handlers in the operating system. The ultimate goal is to rely on the interrupt subsystem of a given hardware platform to take over more of the scheduling and dispatching work as compared to traditional, software-based scheduler implementations.

This discussion focuses on the OS abstractions as defined by the OSEK specification, which is widely used in the automotive domain for embedded microcontrollers. Where it differs from common OS terminology, the corresponding terms are listed in parentheses.

Implementation of Thread-Related OS Abstractions and Services

When the system is configured according to the application needs, each thread (or task in OSEK terminology) is assigned an IRQ source on the hardware platform, configuring the IRQ priority as specified by the thread configuration:

Setting a thread ready (task activation in OSEK) is implemented by requesting the corresponding interrupt. Depending on the current CPU priority, the current thread will be preempted, or the request will be recorded for later dispatching by the hardware. This way, the hardware practically maintains the ready list in these request bits and the preempted-thread state—as opposed to maintaining a software structure for that in the kernel.

Interrupt service routines (ISRs) simply use IRQ sources that are connected to the corresponding hardware. Since their priorities can be mixed between thread priorities, the real-time problem of rate-monotonic priority inversion (where low-priority ISRs can delay high-priority threads) can be avoided along the way.

Timers (called alarms in OSEK) also fit in nicely. Those threads that are configured to be possibly activated by alarm expiry at run time are assigned an IRQ source that is connected to the hardware timer system. This way, setting an alarm at run time corresponds to simply activating the associated timer unit. Upon timer expiry, the hardware will automatically schedule the activated thread, immediately dispatching it by preempting the current thread if the pending priority is high enough.

Critical sections (named resources in OSEK) are implemented by raising and lowering the current CPU priority. If another thread becomes pending during the critical section, it will not be dispatched until that section is left—which is when the current priority is lowered again. This way, avoidance of priority inversion can be reached as proposed by OSEK's stack-based priority ceiling protocol, for instance.

Requirements on the Hardware Platform

In order to support the Sloth implementation, the interrupt system of the hardware platform has to fulfill only two requirements:

It has to offer as many interrupt priorities as there are threads in the system. Platforms such as the Infineon TriCore or the ARM Cortex-M3 offer 256 interrupt priorities.
It has to offer the possibility to record an interrupt request from within the software. The platforms mentioned allow this by mapping the IRQ-source registers into the address space.

Advantages of the Sloth Design

An unorthodox way to implement threads such as in Sloth has a positive impact on several properties of the operating system kernel.

1. Sloth Is Simple

Sloth internally only uses a single control-flow abstraction—namely interrupt handlers—to implement several kinds of OS abstractions as offered to the application programmer:

Threads (named tasks in OSEK), which can set ready other threads, being preempted as described above.
Interrupt handlers that can call system services (named ISRs of category 2 in OSEK) and therefore need to be synchronized with the kernel in order not to corrupt kernel state. This synchronization is performed by raising and lowering the CPU priority when threads execute system calls.
Interrupt handlers that must not call system services (named ISRs of category 1 in OSEK) are assigned a priority higher than all threads and category-2 ISRs in the system. This way, they have minimal execution latency because they are never delayed by those other types of control flows.
Callbacks (e.g., upon timer expiry) were introduced in systems like OSEK to provide a low-overhead way to respond to events. In Sloth, however, they are treated as high-priority ISRs or threads since ISRs and threads are already lowest-overhead.

Another consequence of only having interrupt handlers as control flows is that there is only a single means of synchronization in the system, which is raising and lowering the CPU priority. Considering that synchronization is one of the hardest tasks to do right in OS engineering, this simplicity is a big advantage.

2. Sloth Is Small

As can be seen from the design description above, the Sloth design is very concise and good to grasp.

The source code for the complete OSEK-BCC1 implementation on the Infineon-TriCore platform is less than 200 lines of C code, with one instance per OS abstraction (task, resource, alarm); more instances need more code to be generated, but do not add to the system complexity. This makes Sloth a good subject for verification.

The compiled memory footprint of the Sloth kernel is also extremely small. The OSEK-BCC1 Infineon-TriCore implementation takes about 700 Bytes, again with one instance per OS abstraction. Considering that the start-up code alone as provided by the compiler (tricore-gcc by HighTec) takes up more than 1,000 Bytes (500 Bytes after manual tweaking), this is very competitive. A system call such as setReady(thread) (activate(task) in OSEK) is basically compiled to a single, memory-mapped memory-store instruction to the corresponding register.

3. Sloth Is Fast

We have looked into Sloth's performance by measuring the execution time of the thread-related OSEK system calls. All other system calls as well as the application code itself will obviously be similar or even the same for traditional, software-based kernels. We have compared these microbenchmarks to the performance of the CiAO kernel, which has been published to have a competitive performance itself. The results are very promising:

The table shows the number of clock cycles the respective microbenchmark (distinguishing between triggering and not triggering a thread dispatch where needed) needs to execute on Sloth and on CiAO. The numbers for Sloth depend on the number of threads in the system; the more threads (i.e., interrupt handlers) the system supports, the longer the interrupt arbitration takes. That is why there is a row for both the best case (up to 3 threads) and the worst case (up to 255 threads). In any case, Sloth performs equally well or up to seven times better than CiAO.

4. Sloth Is Cool

At least we like to think so :)...