Konfigurierbare Systemsoftware (KSS)

VL 6 – Generative Programming: The SLOTH Approach

Daniel Lohmann

Lehrstuhl für Informatik 4
Verteilte Systeme und Betriebssysteme
Friedrich-Alexander-Universität
Erlangen-Nürnberg

SS 14 – 2014-05-20

http://www4.informatik.uni-erlangen.de/Lehre/SS14/V_KSS

Implementation Techniques: Classification

Decompositional Approaches
- Text-based filtering (untyped)
- Preprocessors

Compositional Approaches
- Language-based composition mechanisms (typed)
- OOP, AOP, Templates

Generative Approaches
- Metamodel-based generation of components (typed)
- MDD, C++ TMP, generators

Implementation Techniques: Classification

Decompositional Approaches
- Text-based filtering (untyped)
- Preprocessors

Compositional Approaches
- Language-based composition mechanisms (typed)
- OOP, AOP, Templates

Generative Approaches
- Metamodel-based generation of components (typed)
- MDD, C++ TMP, generators

“I’d rather write programs to write programs than write programs.”

Dick Sites (DEC)
The OSEK Family of Automotive OS Standards

- **1995** OSEK OS (OSEK/VDX) [9]
- **2001** OSEKtime (OSEK/VDX) [11]
- **2005** AUTOSAR OS (AUTOSAR) [1]

OSEK OS

- statically configured, event-triggered real-time OS

OSEKtime

- statically configured, time-triggered real-time OS
- can optionally be extended with OSEK OS (to run in slack time)

AUTOSAR OS

- statically configured, event-triggered real-time OS
- real superset of OSEK OS \(\leadsto\) backwards compatible
- additional time-triggered abstractions (schedule tables, timing protection)
- intended as a successor for both OSEK OS and OSEKtime

OSEK OS: Abstractions [9]

- **Control flows**
  - **Task**: software-triggered control flow (strictly priority-based scheduling)
    - Basic Task (BT) run-to-completion task with strictly stack-based activation and termination
    - Extended Task (ET) may suspend and resume execution (\(\mapsto\) coroutine)
  - **ISR**: hardware-triggered control flow (hardware-defined scheduling)
    - Cat 1 ISR (ISR1) runs below the kernel, may not invoke system services (\(\mapsto\) prologue without epilogue)
    - Cat 2 ISR (ISR2) synchronized with kernel, may invoke system services (\(\mapsto\) epilogue without prologue)
  - **Hook**: OS-triggered signal/exception handler
    - **ErrorHook** invoked in case of a syscall error
    - **StartupHook** invoked at system boot time
    - ...
OSEK OS: Abstractions [9]

- Coordination and synchronization
  - Resource: mutual exclusion between well-defined set of tasks
    - stack-based priority ceiling protocol ([12]):
      - GetResource() \( \rightarrow \) priority is raised to that of highest participating task
    - pre-defined RES_SCHED has highest priority (\( \rightarrow \) blocks preemption)
    - implementation-optinal: task set may also include cat 2 ISRs
  - Event: condition variable on which ETs may block
    - part of a task’s context
  - Alarm: asynchronous trigger by HW/SW counter
    - may execute a callback, activate a task, or set an event on expiry

OSEK OS: Conformance Classes [9]

- OSEK offers predefined tailorability by four conformance classes
  - BCC1: only basic tasks, limited to one activation request per task and one task per priority, while all tasks have different priorities
  - BCC2: like BCC1, plus more than one task per priority possible and multiple requesting of task activation allowed
  - ECC1: like BCC1, plus extended tasks
  - ECC2: like ECC1, plus more than one task per priority possible and multiple requesting of task activation allowed for basic tasks

The OSEK feature diagram

OSEK OS: System Services (Excerpt)

- Task-related services
  - ActivateTask(task) \( \sim \) task is active (\( \rightarrow \) ready), counted
  - TerminateTask() \( \sim \) running task is terminated
  - Schedule() \( \sim \) active task with highest priority is running
  - ChainTask(task) \( \rightarrow \) atomic

- Resource-related services
  - GetResource(res) \( \sim \) current task has res ceiling priority
  - ReleaseResource(res) \( \sim \) current task has previous priority

- Event-related services (extended tasks only!)
  - SetEvent(task, mask) \( \sim \) events in mask for task are set
  - ClearEvent(mask) \( \sim \) events in mask for current task are unset
  - WaitEvent(mask) \( \sim \) current task blocks until event from mask has been set

- Alarm-related services
  - SetAbsAlarm(alarm, ...) \( \sim \) arms alarm with absolute offset
  - SetRelAlarm(alarm, ...) \( \sim \) arms alarm with relative offset

OSEK OS: System Specification with OIL [10]

- An OSEK OS instance is configured completely statically
  - all general OS features (hooks, ...)
  - all instances of OS abstractions (tasks, ...)
  - all relationships between OS abstractions
  - described in a domain-specific language (DSL)

- OIL: The OSEK Implementation Language
  - standard types and attributes (TASK, ISR, ...)
  - vendor/plattform-specific attributes (ISR source, priority, triggering)
  - task types and conformance class is deduced

OSEK OS: System Specification with OIL [10]

- OIL File for Example System (BCC1)
  - Three basic tasks: Task1, Task2, Task4
  - Category 2 ISR: ISR2 (platform-spec. source/priority)
  - Task1 and Task3 use resource Res1 - ceiling pri = 3
  - Alarm Alarm1 triggers Task4 on expiry
OSEK OS: Example Control Flow

- Basic tasks behave much like IRQ handlers
  - on a system with support for IRQ priority levels
  - priority-based dispatching with run-to-completion
- LIFO, all control flows can be executed on a single shared stack
- So why not dispatch tasks as ISRs?
  ~ Let the hardware do all scheduling!
  ~ Let’s be a SLOTH!

"SLOTH: Threads as Interrupts"

- Idea: threads are interrupt handlers,
  synchronous thread activation is IRQ
- Let interrupt subsystem do the scheduling and dispatching work
- Applicable to priority-based real-time systems
- Advantage: small, fast kernel with unified control-flow abstraction
**SLOTH Design**

- IRQ system must support priorities and software triggering

**SLOTH: Example Control-Flow**

**SLOTH: Qualitative Results**

- Concise kernel design and implementation
  - < 200 LoC, < 700 bytes code memory, very little RAM
- Single control-flow abstraction for tasks, ISRs (1/2), callbacks
  - Handling oblivious to how it was triggered (by hardware or software)
- Unified priority space for tasks and ISRs
  - No rate-monotonic priority inversion [3, 4]
- Straight-forward synchronization by altering CPU priority
  - Resources with ceiling priority (also for ISRs!)
  - Non-preemptive sections with RES_SCHEDULER (highest task priority)
  - Kernel synchronization with highest task/cat.-2-ISR priority

**Performance Evaluation: Methodology**

- Reference implementation for Infineon TriCore
  - 32-bit load/store architecture
  - Interrupt controller: 256 priority levels, about 200 IRQ sources with memory-mapped registers
  - Meanwhile also implementations for ARM Cortex-M3 (SAM3U) and x86
- Evaluation of task-related system calls:
  - Task activation
  - Task termination
  - Task acquiring/releasing resource
- Comparison with commercial OSEK implementation and CiAO

Two numbers for SLOTH: best case, worst case

- Depending on number of tasks and system frequency
### Control Flows in Embedded Systems

<table>
<thead>
<tr>
<th>ISRs</th>
<th>Activation Event</th>
<th>Sched./Disp.</th>
<th>Semantics</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hardware (HW)</td>
<td>HW</td>
<td>by HW</td>
<td>RTC</td>
</tr>
<tr>
<td>Software (SW)</td>
<td>SW</td>
<td>by OS</td>
<td>Blocking</td>
</tr>
<tr>
<td>Hardware or Software</td>
<td>HW or SW</td>
<td>by HW</td>
<td>RTC</td>
</tr>
<tr>
<td>Hardware or Software</td>
<td>HW or SW</td>
<td>by HW</td>
<td>RTC or Blocking</td>
</tr>
</tbody>
</table>

(RTC: Run-to-Completion)

### Agenda

- **6.1** Motivation: OSEK and Co
- **6.2** SLOTH: Threads as Interrupts
- **6.3** SLEEPY SLOTH: Threads as IRQs asThreads
  - Motivation
  - Design
  - Results
- **6.4** SAFER SLOTH: Hardware-Tailored Isolation
- **6.5** SLOTH ON TIME: Time-Triggered Laziness
- **6.6** SLOTH+ Generation
- **6.7** Summary and Conclusions
- **6.8** References

### Limitations of the SLOTH Approach

- No multiple tasks per priority (OSEK BCC2 / ECC2) → execution order has to be the same as activation order
- No extended tasks (that is, events, OSEK ECC1 / ECC2) → impossible with stack-based IRQ execution model
- No safety (that is, AUTOSAR-OS memory protection) → impossible if everything runs as IRQ handler

### Performance Evaluation: Results

![Graph showing performance evaluation results](image)

- Speed-Up w/ dispatch:
  - Activate() ≈ 2x
  - Activate() ≈ 4x
  - Terminate() ≈ 20x
  - Chain() ≈ 5x
  - GetRes() ≈ 3x
  - ReleaseRes() ≈ 8x

- Speed-Up w/o dispatch:
  - Activate() ≈ 2x
  - Activate() ≈ 4x
  - Terminate() ≈ 20x
  - Chain() ≈ 5x
  - GetRes() ≈ 3x
  - ReleaseRes() ≈ 8x
**SLEEPY SLOTH: Main Goal and Challenge**

**Main Goal**
Support extended blocking tasks (with stacks of their own), while preserving SLOTH’s latency benefits by having threads run as ISRs.

**Main Challenge**
IRQ controllers do not support suspension and re-activation of ISRs.

---

**SLEEPY SLOTH: Dispatching and Rescheduling**

- Task prologue: switch stacks if necessary
  - Switch basic task → basic task omits stack switch
  - On job start: initialize stack
  - On job resume: restore stack

- Task termination: task with next-highest priority needs to run
  - Yield CPU by setting priority to zero
  - (Prologue of next task performs the stack switch)

- Task blocking: take task out of “ready list”
  - Disable task’s IRQ source
  - Yield CPU by setting priority to zero

- Task unblocking: put task back into “ready list”
  - Re-enable task’s IRQ source
  - Re-trigger task’s IRQ source by setting its pending bit

---

**SLEEPY SLOTH: Example Control Flow**

---

**SLEEPY SLOTH Design: Task Prologues and Stacks**
Evaluation: Extended and Basic Tasks

- Basic switches in a mixed system only slightly slower than in purely basic system

Limitations of the SLOTH Approach

- No multiple tasks per priority (→ OSEK BCC2 / ECC2)
  - execution order has to be the same as activation order
- No extended tasks (that is, events, → OSEK ECC1 / ECC2)
  - impossible with stack based IRQ execution model
- No safety (that is, AUTOSAR-OS memory protection)
  - impossible if everything runs as IRQ handler

Agenda

6.1 Motivation: OSEK and Co
6.2 SLOTH: Threads as Interrupts
6.3 SLEEPY SLOTH: Threads as IRQs as Threads
6.4 SAFER SLOTH: Hardware-Tailored Isolation
  - Main Goal
  - Main Challenge

SAFER SLOTH: Main Goal and Challenge

Main Goal
Support isolation of state and privileges, while preserving SLOTH’s latency benefits by having threads run as ISRs.

Main Challenge
ISRs run in supervisor mode with full privileges. So how to
- Effectively isolate kernel and application
- Maintain design principles of Sloth
Memory Protection in Embedded Systems

- Safety, but not security
- Protect the data, but not the code
- Safety model based on AUTOSAR OS
- MPU-based isolation

Vertically: Protect kernel state and MPU configuration
Horizontally: Isolate applications or even tasks from each other

Protection Modes in SAFER SLOTH

Unsafe
- The original Sloth OS, without isolation

MPU
- MPU active, but tasks execute with supervisor privileges
- Vertical isolation ensured constructively in post-validation

MPU+traps
- Vertical isolation ensured by hardware privilege levels
- System services acquire kernel privileges via syscall mechanism

Maintaining the SLOTH Principles for SAFER SLOTH

- Exploit as much knowledge about target hardware as possible
- Tailor kernel to fit both the platform and the application
- Taking into account:
  - Extent and layout of MPU configuration
  - Method for re-programming the MPU
  - Available hardware privilege levels
  - Is MPU active in all levels?
  - Degree of safety required by the application

SAFER SLOTH: Architecture
### MPU mode in SAFER SLOTH

#### unsafe Mode:

```c
// Example code

void task1()
{
    ... // inline call to
    // GetResource(Res1):
    // mpucr d15,psw
    // insert d15,d15,0,12,1
    // if (Res1 > prio) {
    //     setCurPrio(Res1);
    // }
    // enable MPU
    // mpucr d15,psw
    // insert d15,d15,15,12,1
    // ... // inlined call to
    // GetResource(Res1):
    // <...>
    // returns to
    // <non-inlined implementation>
    // ... // syscall dispatcher
    ji %a11
}```

#### MPU Mode:

```c
// Example code

void task1()
{
    ... // inline call to
    // GetResource(Res1):
    // mpucr d15,psw
    // insert d15,d15,0,12,1
    // if (Res1 > prio) {
    //     setCurPrio(Res1);
    // }
    // enable MPU
    // mpucr d15,psw
    // insert d15,d15,15,12,1
    // ... // inlined call to
    // GetResource(Res1):
    // <...>
    // returns to
    // <non-inlined implementation>
    // ... // syscall dispatcher
    ji %a11
}
```

### MPU+traps mode in SAFER SLOTH

#### MPU+traps Mode:

```c
// Example code

void task1()
{
    ... // inline call to
    // GetResource(Res1):
    // mpucr d15,psw
    // insert d15,d15,0,12,1
    // if (Res1 > prio) {
    //     setCurPrio(Res1);
    // }
    // enable MPU
    // mpucr d15,psw
    // insert d15,d15,15,12,1
    // ... // inlined call to
    // GetResource(Res1):
    // <...>
    // returns to
    // <non-inlined implementation>
    // ... // syscall dispatcher
    ji %a11
}
```

### The Problem with Traps

- SLOTH gains a lot of its benefits through compiler optimizations
  - Inlining of system service calls
  - Removal of dead code
  - Constant propagation
- Traditional traps **prohibit** such optimizations
  - System services must be standalone functions
  - Jumped to via a syscall dispatcher

### Solution Idea

**Combine MPU and MPU+traps mode**

~~ Inline traps as 4th protection mode (MPU+itraps) ~
Evaluation Results: Total Overheads

Safer Sloth vs. commercial AUTOSAR OS

Safer Sloth (unsafe mode)

\[ \Delta \text{MPU} / \text{traps} / \text{itraps} \]

AUTOSAR OS (unsafe)

Cycles

Activate w/o disp.
Activate w/ disp.
Terminate w/ disp.
Chain Task w/ disp.
Get Resource w/o disp.
Get Resource w/ disp.
Release Res w/o disp.
Release Res w/ disp.

Evaluation Results: Additional Overheads

Safer Sloth vs. commercial AUTOSAR OS

Safer Sloth (unsafe mode)

\[ \Delta \text{MPU} + \text{traps} \]

AUTOSAR OS (\( \Delta \text{MPU} + \text{traps} \))

Cycles

Activate w/o disp.
Activate w/ disp.
Terminate w/ disp.
Chain Task w/ disp.
Get Resource w/o disp.
Get Resource w/ disp.
Release Res w/o disp.
Release Res w/ disp.

Limitations of the SLOTH Approach

- No multiple tasks per priority (\( \mapsto \) OSEK BCC2 / ECC2)
  \( \mapsto \) execution order has to be the same as activation order
- No extended tasks (that is, events, \( \mapsto \) OSEK ECC1 / ECC2)
  \( \mapsto \) impossible with stack-based IRQ execution model
- No safety (that is, AUTOSAR-OS memory protection)
  \( \mapsto \) impossible if everything runs as IRQ handler

Agenda

6.1 Motivation: OSEK and Co
6.2 SLOTH: Threads as Interrupts
6.3 SLEEPY SLOTH: Threads as IRQs as Threads
6.4 SAFER SLOTH: Hardware-Tailored Isolation
6.5 SLOTH ON TIME: Time-Triggered Laziness
6.6 SLOTH+ Generation
6.7 Summary and Conclusions
6.8 References
**SLOTH ON TIME: Time-Triggered Laziness** [5]

- Idea: use hardware timer arrays to implement schedule tables
- TC1796 GPTA: 256 timer cells, routable to 96 interrupt sources
  - use for task activation, deadline monitoring, execution time budgeting, time synchronization, and schedule table control
- SLOTH ON TIME implements OSEKtime [11] and AUTOSAR OS schedule tables [1]
  - combinable with SLOTH or SLEEPY SLOTH for mixed-mode systems
  - up to 170x lower latencies compared to commercial implementations

**Qualitative Evaluation: AUTOSAR**

Commercial AUTOSAR: **Priority inversion** with time-triggered activation (2,075 cycles each)

SLOTH ON TIME: avoids this by design!

“Interrupts are perhaps the biggest cause of priority inversion in real-time systems, causing the system to not meet all of its timing requirements.”


---

**Agenda**

6.1 Motivation: OSEK and Co
6.2 SLOTH: Threads as Interrupts
6.3 SLEEPY SLOTH: Threads as IRQs as Threads
6.4 SAFER SLOTH: Hardware-Tailored Isolation
6.5 SLOTH ON TIME: Time-Triggered Laziness
6.6 SLOTH* Generation
6.7 Summary and Conclusions
6.8 References

---

**SLOTH* Generation**

- Two generation dimensions
  - Architecture
  - Application

Generator is implemented in Perl
- Templates
- Configuration
Summary: The SLOTH\* Approach

- Exploit standard interrupt/timer/mpu hardware to delegate core OS functionality to hardware
- scheduling and dispatching of control flows
- OS needs to be tailored to application and hardware platform
  ~ generative approach is necessary

Benefits

- tremendous latency reductions, very low memory footprints
- unified control flow abstraction
  - hardware/software-triggered, blocking/run-to-completion
  - no need to distinguish between tasks and ISRs
  - no rate-monotonic priority inversion
  - reduces complexity
- less work for the OS developer :-)


