Friedrich-Alexander-Universität UnivisSearch FAU-Logo
Techn. Fakultät Willkommen am Department Informatik FAU-Logo
Logo I4
Department of Computer Science 4
Sloth
 
 Motivation
 Publications
 Slides
 People
 Theses

 Threads as IRQs
 Threads as IRQs as Threads
Dept. of Computer Science  >  CS 4  >  Research  >  Sloth

Sloth: A Minimal-Effort Kernel for Embedded Systems

Motivation

The main research goal in the Sloth project is to investigate how to make better use of given hardware abstractions in (embedded) operating systems.

When designing an operating system, the operating system engineer always faces a trade-off decision between an efficient system implementation on a given hardware platform and portability to other platforms. This is because usually operating system kernels make use of a hardware abstraction layer internally, which hides platform specifics to a certain extent, facilitating the porting process.

Whereas sacrificing efficiency and memory footprint for portability might be feasible in desktop and server operating system kernels, it is difficult to argue this way when building embedded operating systems. On the one hand, those systems need to be configurable and tailorable to the applications' needs, which run on top of the kernel. This is what our research in the CiAO and VAMOS projects is focused on. On the other hand, the kernel will always need some kind of hardware abstraction layer, which needs to be ported to the different hardware platforms. The Sloth project investigates how to move down that abstraction layer a little, making better use of the underlying hardware platform to implement the offered system services and abstractions, trading the better efficiency and footprint for slightly more porting effort and hardware dependency.

The first part of the Sloth project is to investigate how to better implement threads in embedded real-time kernels. To learn more about that, see Threads as Interrupts and the RTSS 2009 paper.

The second part of the Sloth project, named Sleepy Sloth, focuses on how to provide the application with more flexibility by providing a blocking thread abstraction while still executing efficiently using interrupt hardware scheduling and dispatching. To learn more about that, see Threads as Interrupts as Threads and the RTSS 2011 paper.

Publications

RTSS 2011

Hofer, Wanja ; Lohmann, Daniel ; Schröder-Preikschat, Wolfgang:
Sleepy Sloth: Threads as Interrupts as Threads .
In: Almeida, Luis ; Brandt, Scott (Ed.) : Proceedings of the 32nd IEEE Real-Time Systems Symposium (RTSS 2011)
(32nd IEEE Real-Time Systems Symposium (RTSS 2011), Vienna, Austria, November 2011).
Los Alamitos, CA, USA : IEEE Computer Society, 2011, pp 67-77. - ISBN 978-0-7695-4591-2
Keywords:  Sloth; Operating Systems; Embedded Systems; Real-Time Systems; Thread Management; Interrupt Handling; OSEK; Infineon TriCore; Priority-Driven Scheduling; Blocking Threads
[doi>10.1109/RTSS.2011.14] (BibTeX)

RTSS 2009

Hofer, Wanja ; Lohmann, Daniel ; Scheler, Fabian ; Schröder-Preikschat, Wolfgang:
Sloth: Threads as Interrupts .
In: Baker, Theodore P. (Ed.) : Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS 2009)
(30th IEEE Real-Time Systems Symposium (RTSS 2009), Washington, D.C., USA, December 2009).
Los Alamitos, CA, USA : IEEE Computer Society, 2009, pp 204-213. - ISBN 978-0-7695-3875-4
Keywords:  Sloth; Operating Systems; Embedded Systems; Real-Time Systems; Thread Management; Interrupt Handling; OSEK; Infineon TriCore; Priority-Driven Scheduling; CiAO; VAMOS
[doi>10.1109/RTSS.2009.18] (BibTeX)

SOSP-WiP 2009

Hofer, Wanja ; Lohmann, Daniel ; Scheler, Fabian ; Schröder-Preikschat, Wolfgang:
Sloth: Let the Hardware Do the Work!
In: ACM SIGOPS (Ed.) : Proceedings of the Work-in-Progress Session of the 22nd ACM Symposium on Operating Systems Principles (SOSP-WiP 2009)
(Work-in-Progress Session of the 22nd ACM Symposium on Operating Systems Principles (SOSP-WiP 2009), Big Sky, MT, USA, October 2009).
2009, pp 1.
Keywords:  Sloth; Threads; Interrupts; OSEK (BibTeX)

Presentation Slides

The slides used for presentations on Sloth are usually rather sparse, so be sure to read the papers or contact Wanja if you want more details.

People Involved in Sloth

Dipl.-Inf. Wanja Hofer Rainer Müller Daniel Danner Dr.-Ing. Fabian Scheler Dr.-Ing. Daniel Lohmann Prof. Dr.-Ing. Wolfgang Schröder-Preikschat

Theses

Ongoing Theses

Finished Theses

Sloth: Threads as Interrupts

The core idea behind the Sloth system is to implement all threads as interrupt handlers in the operating system. The ultimate goal is to rely on the interrupt subsystem of a given hardware platform to take over more of the scheduling and dispatching work as compared to traditional, software-based scheduler implementations.

This discussion focuses on the OS abstractions as defined by the OSEK specification, which is widely used in the automotive domain for embedded microcontrollers. Where it differs from common OS terminology, the corresponding terms are listed in parentheses.

Implementation of Thread-Related OS Abstractions and Services

When the system is configured according to the application needs, each thread (or task in OSEK terminology) is assigned an IRQ source on the hardware platform, configuring the IRQ priority as specified by the thread configuration:

Sloth design

Setting a thread ready (task activation in OSEK) is implemented by requesting the corresponding interrupt. Depending on the current CPU priority, the current thread will be preempted, or the request will be recorded for later dispatching by the hardware. This way, the hardware practically maintains the ready list in these request bits and the preempted-thread state—as opposed to maintaining a software structure for that in the kernel.

Interrupt service routines (ISRs) simply use IRQ sources that are connected to the corresponding hardware. Since their priorities can be mixed between thread priorities, the real-time problem of rate-monotonic priority inversion (where low-priority ISRs can delay high-priority threads) can be avoided along the way.

Timers (called alarms in OSEK) also fit in nicely. Those threads that are configured to be possibly activated by alarm expiry at run time are assigned an IRQ source that is connected to the hardware timer system. This way, setting an alarm at run time corresponds to simply activating the associated timer unit. Upon timer expiry, the hardware will automatically schedule the activated thread, immediately dispatching it by preempting the current thread if the pending priority is high enough.

Critical sections (named resources in OSEK) are implemented by raising and lowering the current CPU priority. If another thread becomes pending during the critical section, it will not be dispatched until that section is left—which is when the current priority is lowered again. This way, avoidance of priority inversion can be reached as proposed by OSEK's stack-based priority ceiling protocol, for instance.

Requirements on the Hardware Platform

In order to support the Sloth implementation, the interrupt system of the hardware platform has to fulfill only two requirements:

  1. It has to offer as many interrupt priorities as there are threads in the system. Platforms such as the Infineon TriCore or the ARM Cortex-M3 offer 256 interrupt priorities.
  2. It has to offer the possibility to record an interrupt request from within the software. The platforms mentioned allow this by mapping the IRQ-source registers into the address space.

Advantages of the Sloth Design

An unorthodox way to implement threads such as in Sloth has a positive impact on several properties of the operating system kernel.

1. Sloth Is Simple

Sloth internally only uses a single control-flow abstraction—namely interrupt handlers—to implement several kinds of OS abstractions as offered to the application programmer:

  • Threads (named tasks in OSEK), which can set ready other threads, being preempted as described above.
  • Interrupt handlers that can call system services (named ISRs of category 2 in OSEK) and therefore need to be synchronized with the kernel in order not to corrupt kernel state. This synchronization is performed by raising and lowering the CPU priority when threads execute system calls.
  • Interrupt handlers that must not call system services (named ISRs of category 1 in OSEK) are assigned a priority higher than all threads and category-2 ISRs in the system. This way, they have minimal execution latency because they are never delayed by those other types of control flows.
  • Callbacks (e.g., upon timer expiry) were introduced in systems like OSEK to provide a low-overhead way to respond to events. In Sloth, however, they are treated as high-priority ISRs or threads since ISRs and threads are already lowest-overhead.

Another consequence of only having interrupt handlers as control flows is that there is only a single means of synchronization in the system, which is raising and lowering the CPU priority. Considering that synchronization is one of the hardest tasks to do right in OS engineering, this simplicity is a big advantage.

2. Sloth Is Small

As can be seen from the design description above, the Sloth design is very concise and good to grasp.

The source code for the complete OSEK-BCC1 implementation on the Infineon-TriCore platform is less than 200 lines of C code, with one instance per OS abstraction (task, resource, alarm); more instances need more code to be generated, but do not add to the system complexity. This makes Sloth a good subject for verification.

The compiled memory footprint of the Sloth kernel is also extremely small. The OSEK-BCC1 Infineon-TriCore implementation takes about 700 Bytes, again with one instance per OS abstraction. Considering that the start-up code alone as provided by the compiler (tricore-gcc by HighTec) takes up more than 1,000 Bytes (500 Bytes after manual tweaking), this is very competitive. A system call such as setReady(thread) (activate(task) in OSEK) is basically compiled to a single, memory-mapped memory-store instruction to the corresponding register.

3. Sloth Is Fast

We have looked into Sloth's performance by measuring the execution time of the thread-related OSEK system calls. All other system calls as well as the application code itself will obviously be similar or even the same for traditional, software-based kernels. We have compared these microbenchmarks to the performance of the CiAO kernel, which has been published to have a competitive performance itself. The results are very promising:

Sloth benchmarks

The table shows the number of clock cycles the respective microbenchmark (distinguishing between triggering and not triggering a thread dispatch where needed) needs to execute on Sloth and on CiAO. The numbers for Sloth depend on the number of threads in the system; the more threads (i.e., interrupt handlers) the system supports, the longer the interrupt arbitration takes. That is why there is a row for both the best case (up to 3 threads) and the worst case (up to 255 threads). In any case, Sloth performs equally well or up to seven times better than CiAO.

4. Sloth Is Cool

At least we like to think so :)...

Sleepy Sloth: Threads as Interrupts as Threads

The original Sloth system has as its main drawback that it does not support blocking tasks, which are called extended tasks in OSEK terminology and which run on stacks of their own. Some applications, however, require blocking semantics for their tasks in order to allow for manageable decomposition. Thus, the goal for Sleepy Sloth is to provide the application with more flexibility by offering a blocking thread abstraction while still executing efficiently by utilizing the interrupt subsystem of a given hardware platform.

Challenges

There are three main challenges in designing and implementing such a system:

  1. Interrupt controllers do not support suspension and re-activation of interrupt handlers, which we need to implement blocking and unblocking.
  2. In a Sloth-like system, the exact preemption points cannot be located in the system code because the interrupt hardware decides if a preemption takes place (depending on the priority situation). Thus, we cannot switch stacks before doing the task switch as usual.
  3. As part of Sleepy Sloth's goal, we want to maintain to maintain the execution efficiency from Sloth as far as possible. Especially for basic run-to-completion tasks, which do not make use of the added blocking flexibility, we have to find to leave them unaffected by additional overhead.

The Sleepy Sloth Task Prologue

The central design element in Sleepy Sloth to tackle those challenges is its task prologue. This prologue code is prepended to every task function (which corresponds to an interrupt handler in Sloth) and is thus executed whenever a preemption or other task switch takes place. The task prologue has the following responsibilities:

  1. It first saves the context of the interrupted task to the current stack.
  2. After that, it checks if it needs to switch stacks and does so if necessary. Stack switches are not necessary when a basic run-to-completion task preempts another basic run-to-completion task; only if at least one extended blocking task is involved, the stack needs to be switched.
  3. If the interrupted task has not terminated, its IRQ source is re-triggered to assure its execution to be continued later.
  4. Then, the prologue checks if the task it belongs to has run before in that job instance or not.
  5. If it has, then the context is restored from a kernel context array and execution of the task is continued by returning using the restored return address.
  6. If it has not, then the context is initialized (including resetting the stack pointer to the top of the allocated task stack) and, after enabling IRQs, the prologue jumps to the user task function.

Interaction with Points of Rescheduling

Task termination is done by yielding the CPU by setting the CPU priority to zero, having the IRQ hardware dispatch the task with next-highest pending priority. The prologue of this next task does the stack switch.

Tasks are blocked by disabling the corresponding IRQ source. This way, it will not be respected by the IRQ arbitration system to compete for being dispatched. After blocking the IRQ source, the CPU is yielded by setting the priority to zero, in the same way as when terminating a task.

Tasks are unblocked by re-enabling the corresponding IRQ source and triggering its pending bit. This will lead to a rescheduling decision in the hardware arbitration unit and dispatch the unblocked task if it has a higher priority in a preemptive system.

Sleepy Sloth Performance

Using our reference implementation for the Infineon TriCore microcontroller, we evaluated Sleepy Sloth in three different task scenarios.

In an application with only basic run-to-completion tasks, Sleepy Sloth is as fast the original Sloth kernel because it is tailored to the application configuration.

In an application with only extended blocking tasks, the system calls are burdened with an additional overhead due to the execution of the task prologue, which is not necessary in run-to-completion systems. Nevertheless, the system calls show a speed-up of 1.6 to 3.5 compared to a commercial OSEK implementation.

In an application with both kinds of tasks, the Sleepy Sloth overhead scales with the demand: task switches between basic run-to-completion tasks are faster than those between extended tasks.

Summary

Sleepy Sloth offers a universal thread abstraction

  • that can be activated by a hardware event or a software event,
  • that can have run-to-completion or blocking semantics,
  • and that is scheduled and dispatched efficiently by interrupt hardware in a single priority space.

  Contact Last modified: 2012-03-22 14:29   WH