Sloth: A Minimal-Effort Kernel for Embedded Systems

Sleepy Sloth: Threads as Interrupts as Threads

The original Sloth system has as its main drawback that it does not support blocking tasks, which are called extended tasks in OSEK terminology and which run on stacks of their own. Some applications, however, require blocking semantics for their tasks in order to allow for manageable decomposition. Thus, the goal for Sleepy Sloth is to provide the application with more flexibility by offering a blocking thread abstraction while still executing efficiently by utilizing the interrupt subsystem of a given hardware platform.

Challenges

There are three main challenges in designing and implementing such a system:

Interrupt controllers do not support suspension and re-activation of interrupt handlers, which we need to implement blocking and unblocking.
In a Sloth-like system, the exact preemption points cannot be located in the system code because the interrupt hardware decides if a preemption takes place (depending on the priority situation). Thus, we cannot switch stacks before doing the task switch as usual.
As part of Sleepy Sloth's goal, we want to maintain to maintain the execution efficiency from Sloth as far as possible. Especially for basic run-to-completion tasks, which do not make use of the added blocking flexibility, we have to find to leave them unaffected by additional overhead.

The Sleepy Sloth Task Prologue

The central design element in Sleepy Sloth to tackle those challenges is its task prologue. This prologue code is prepended to every task function (which corresponds to an interrupt handler in Sloth) and is thus executed whenever a preemption or other task switch takes place. The task prologue has the following responsibilities:

It first saves the context of the interrupted task to the current stack.
After that, it checks if it needs to switch stacks and does so if necessary. Stack switches are not necessary when a basic run-to-completion task preempts another basic run-to-completion task; only if at least one extended blocking task is involved, the stack needs to be switched.
If the interrupted task has not terminated, its IRQ source is re-triggered to assure its execution to be continued later.
Then, the prologue checks if the task it belongs to has run before in that job instance or not.
If it has, then the context is restored from a kernel context array and execution of the task is continued by returning using the restored return address.
If it has not, then the context is initialized (including resetting the stack pointer to the top of the allocated task stack) and, after enabling IRQs, the prologue jumps to the user task function.

Interaction with Points of Rescheduling

Task termination is done by yielding the CPU by setting the CPU priority to zero, having the IRQ hardware dispatch the task with next-highest pending priority. The prologue of this next task does the stack switch.

Tasks are blocked by disabling the corresponding IRQ source. This way, it will not be respected by the IRQ arbitration system to compete for being dispatched. After blocking the IRQ source, the CPU is yielded by setting the priority to zero, in the same way as when terminating a task.

Tasks are unblocked by re-enabling the corresponding IRQ source and triggering its pending bit. This will lead to a rescheduling decision in the hardware arbitration unit and dispatch the unblocked task if it has a higher priority in a preemptive system.

Sleepy Sloth Performance

Using our reference implementation for the Infineon TriCore microcontroller, we evaluated Sleepy Sloth in three different task scenarios.

In an application with only basic run-to-completion tasks, Sleepy Sloth is as fast the original Sloth kernel because it is tailored to the application configuration.

In an application with only extended blocking tasks, the system calls are burdened with an additional overhead due to the execution of the task prologue, which is not necessary in run-to-completion systems. Nevertheless, the system calls show a speed-up of 1.6 to 3.5 compared to a commercial OSEK implementation.

In an application with both kinds of tasks, the Sleepy Sloth overhead scales with the demand: task switches between basic run-to-completion tasks are faster than those between extended tasks.

Summary

Sleepy Sloth offers a universal thread abstraction

that can be activated by a hardware event or a software event,
that can have run-to-completion or blocking semantics,
and that is scheduled and dispatched efficiently by interrupt hardware in a single priority space.