## Concurrent Systems

Exercise 03 - Memory, Atomicity, Consistency

Stefan Reif

November 28, 2016



### Agenda

Memory Models

Sequential Consistency

C11/C++11 Atomic Types

C/C++ Memory Consistency Models

Thread Fences

Assignment 3



### What is a memory model?

- Formal definition of memory behavior
  - More or less trivial for sequential programs
  - Complex for parallel programs
- Crucial for application correctness
  - Particularly for parallel programs
- Memory models are created by ...
  - Language designers
  - Hardware developers
  - Service providers



### Sequential Consistency

#### Def. Sequential Consistency

Lamport, 1979

"[...] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program."

#### Benefits of SC

- Result is equivalent to a sequential execution
  - → Pseudo-parallelism?
- Global order of actions
- Analyzability

#### Problems of SC

Even simple compiler optimizations violate SC



### Why don't we always want SC?

- Modern Platforms are not WYSIWYG¹
  - Optimizing Compiler, Assembler, Linker, Hardware
  - Every link in the tool-chain can reorder operations
    - We nearly always want that ...
    - ... but it can break parallel code
    - It does not matter which link breaks our code
- SC makes optimizations hard

| More R                  | eordering                    |
|-------------------------|------------------------------|
| SC                      | ?                            |
| ightarrow Analyzability | $\rightarrow$ Bugs?          |
| ightarrow Correctness   | $\rightarrow \ Performance?$ |



#### SC-DRF

- Distinguish synchronizing actions from non-synchronizing actions
  - Guarantee SC for programs without data races
  - Undefined behavior in case of data race
  - Most code parts allow optimization
- Synchronizing actions provide SC
  - e.g. mutex\_lock, atomic\_fetch\_add, ...
- Non-synchronizing actions can cause data races

#### Def. Data Race

C11 Standard (Draft)

"Two expression evaluations **conflict** if one of them modifies a memory location and the other one reads or modifies the same memory location."

"The execution of a program contains a **data race** if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other."





















### C11 / C++11 Atomic Types

- Modern C/C++ offers atomic types
  - lacktriangle C  $\Rightarrow$  \_Atomic qualifier
  - C++ ⇒ std::atomic<type> template
  - Intentionally compatible:

```
#ifdef __cplusplus
#define _Atomic(type) std::atomic<type>
#endif
```

- Well-defined semantics
  - Compiler guarantees atomicity
  - unlike volatile ...
- Portable
  - Actual run-time properties still depend on the hardware
  - Lock-freedom is not guaranteed



#### Atomic operations

```
#include <stdatomic.h>
   atomic_store(AT*, T);
   atomic_load(AT*);
3
   atomic_exchange(AT*, T);
   atomic_compare_exchange_strong(AT*, T*, T);
   atomic_compare_exchange_weak(AT*, T*, T);
6
   atomic_fetch_add(AT*, T);
   atomic_fetch_sub(AT*, T);
   atomic_fetch_or(AT*, T);
   atomic_fetch_xor(AT*, T);
10
   atomic_fetch_and(AT*, T);
11
```



Atomic operations with explicit memory order parameter

```
#include <stdatomic.h>
   atomic_store_explicit(AT*,T, MO);
   atomic_load_explicit(AT*,MO);
3
   atomic_exchange_explicit(AT*, T, MO);
   atomic_compare_exchange_strong_explicit(AT*,
       T*,T,MO,MO):
6
   atomic_compare_exchange_weak_explicit(AT*,
       T*,T,MO,MO);
   atomic_fetch_add_explicit(AT*, T, MO);
9
   atomic_fetch_sub_explicit(AT*, T, MO);
10
   atomic_fetch_or_explicit(AT*, T, MO);
11
   atomic_fetch_xor_explicit(AT*, T, MO);
12
   atomic_fetch_and_explicit(AT*, T, MO);
13
```



### C/C++ Memory Order Parameters

- 1. sequential-consistent
  - memory\_order\_seq\_cst
- 2. acquire-release
  - memory\_order\_acquire
  - memory\_order\_release
  - memory\_order\_acq\_rel
- 3. consume-release
  - memory\_order\_consume
  - memory\_order\_release
- 4. relaxed
  - memory\_order\_relaxed



### C/C++ Memory Order Parameters

#### 1. sequential-consistent

memory\_order\_seq\_cst

#### 2. acquire-release

- memory\_order\_acquire
- memory\_order\_release
- memory\_order\_acq\_rel

#### consume-release

- memory\_order\_consume
- memory\_order\_release

#### 4. relaxed

memory\_order\_relaxed



Performance?



#### memory\_order\_seq\_cst

- Strongest consistency model
- Implicit consistency model
  - All operations without \_explicit
- Semantics: SC-DRF
  - All operations with memory\_order\_seq\_cst are sequentially consistent
  - Non-atomic variables can cause data races



#### Happens-before relation

- Intra-thread: trivial
- Inter-thread: If  $S_A$  release-stores a value v in A and  $L_A$  acquire-loads that value, then  $S_A$  happens before  $L_A$ .
- $S_A$  also happens before  $L_A$  if  $L_A$  reads a **later** value v' of A
  - → modification order
- Transitive relation
- Acquire/Release consistency
  - Each operation sees side effects of all operations that happened before
- Partial Ordering
  - No global sequence of operations
  - Synchronization per variable
  - Concurrent operations are possible



- lacksquare acquire o load
  - fetch-all-data
  - Side effects after load-acquire remain afterwards
  - Implicit in thrd\_join, mtx\_lock
- $\blacksquare$  release  $\rightarrow$  store
  - push-all-data
  - Side effects before store-release remain before
  - Implicit in thrd\_exit, mtx\_unlock
- $lack acq\_rel o load/store$ 
  - e.g. atomic\_fetch\_and\_\*, atomic\_compare\_exchange\_\*



























#### Example:

```
int data; _Atomic(bool) avail;
   thread1() {
2
     data = createdata(); // not atomic
3
     atomic_store_explicit(&avail, 1,
         memory_order_release);
5
   thread2() {
     if (atomic_load_explicit(&avail,
         memory_order_acquire)) {
       int mycopy = data; use(mycopy);
10
11
12
```



#### Example:

```
_Atomic(foo *) data;
   thread1() {
2
     foo *d = createdata(); // not atomic
3
     atomic_store_explicit(&data, d,
       memory_order_release);
5
   thread2() {
     foo *d = atomic_load_explicit(&data,
       memory_order_acquire);
     if (d)
10
       use(d);
11
   }
12
```



Difference between seq\_cst and acquire/release

```
seq_cst acquire/release
```



Difference between seq\_cst and acquire/release

```
seq_cst acquire/release
```



Difference between seq\_cst and acquire/release

Difference between seq\_cst and acquire/release



| Α | В | $\mathtt{seq}_{\mathtt{-}}\mathtt{cst}$ | acquire/release |
|---|---|-----------------------------------------|-----------------|
| 1 | 1 | ✓                                       | ✓               |
| 1 | 0 | ✓                                       | ✓               |
| 0 | 1 | ✓                                       | ✓               |
| 0 | 0 | X                                       | ✓               |



#### memory\_order\_relaxed

- Weakest memory order parameter
  - Meaningful when additional mechanisms synchronize
- Ensures atomicity
  - No synchronization
  - Lock-freedom not guaranteed
- No reordering constraints
  - Except for thread fences ...
  - Dangerous to use
- Performance improvement?



```
_Atomic(unsigned int) counter;
   search() {
     while (...) {
       if (...)
         atomic_fetch_add_explicit(&counter, 1,
              memory_order_relaxed);
     }
     thrd exit(...):
   }
   main() {
10
     for (...) thrd_create(..., search, ...);
11
     for (...) thrd_join(...);
12
     unsigned int total = atomic_load_explicit(&counter,
13
         memory_order_relaxed);
14
   }
15
```



```
_Atomic(unsigned int) counter;
   search() {
     while (...) {
       if (...)
         atomic_fetch_add_explicit(&counter, 1,
              memory_order_relaxed);
     }
     thrd_exit(...);
   }
   main() {
10
     for (...) thrd_create(..., search, ...);
11
     for (...) thrd_join(...);___
12
     unsigned int total = atomic_load_explicit(&counter,
13
         memory_order_relaxed);
14
   }
15
```



- Enforces additional memory ordering guarantees
  - Often better: use stronger memory order parameter at critical operation
- Parameter: memory order
  - memory\_order\_acquire
  - memory\_order\_release
  - memory\_order\_acq\_rel
  - memory\_order\_seq\_cst
- Improves consistency of relaxed-atomic operations
  - Useful for library functions























```
int data; atomic_bool avail;
   thread1() {
     data = createdata(); // not atomic
3
4
     atomic_store_explicit(&avail, 1,
5
          memory_order_relaxed);
6
   thread2() {
     if (atomic_load_explicit(&avail,
9
            memory_order_relaxed)) {
10
11
        int mydata = data; use(mydata);
12
13
14
```



```
int data; atomic_bool avail;
   thread1() {
     data = createdata(); // not atomic
3
     atomic_thread_fence(memory_order_release);
4
     atomic_store_explicit(&avail, 1,
5
         memory_order_relaxed);
6
   thread2() {
     if (atomic_load_explicit(&avail,
9
            memory_order_relaxed)) {
10
       atomic_thread_fence(memory_order_acquire);
11
       int mydata = data; use(mydata);
12
13
14
```



### Assignment 3

- Implement lock algorithms
  - Lock algorithms are discussed in the lecture
- Test all lock algorithms
- Evaluate all lock algorithms
  - Vary the degree of contention
  - Check multiple back-off algorithms
- Use a model checker for concurrent data structures
  - CDSChecker → plrg.eecs.uci.edu/software\_page/42-2/
  - Write a unit test for at least one lock algorithm



Assignment 3 (2)

#### ■ Implement a simple actor library

- Actors send requests asynchronously
- Each actor owns a worker thread
- Requests are sequentialized implicitly

#### Shortcommings

- No actor states and mutation
- No actor migration
- ...

#### Test your actor library

- Write test cases for your actor
- Can you re-use the lock test cases?

#### ■ Evaluate your actor library

- Compare your actor implementation against lock-based synchronization
- Under which conditions are actors a better choice?

