For a colleagues project, we encountered the problem to check vectors of values for compatibility. The values are either set or undefined. An undefined value is compatible to everything; a set value is compatible to the same value. An example instance of this problem might look like this, when the possible values are 'a', 'b', and 'c'; undefined is indicated by 'U':

Vector 1 | Vector 2 | |
---|---|---|

a | a | | |

b | U | | |

U | U | | |

a | c | |

These two vectors are compatible in their first three lines, since undefined is compatible to everything. How can be test these vectors for compatibility in a fast fashion? The first simple idea is to use an char array and compare it character by character and to encode undefined as 0:

```
char A[] = {'a', 'b', 0, 'a'};
char B[] = {'a', 0, 0, 'c'};
bool compatible = true;
for (unsigned i = 0; i < 4; i++) {
if (A[i] != 0 && B[i] != 0 && A[i] != B[i]) {
compatible = false; break;
}
}
```

This would do the job and is already quite fast. Nevertheless, we can do it faster. We can avoid checking three different conditions (&&) by using multiplication and the zero element property of the number zero:

```
if (A[i] * B[i] * (A[i] - B[i]) != 0) {
compatible = false; break;
}
```

This expression is exactly then non-zero, if both elements are non-zero and their difference is non-zero, a trait that is also known as inequality. But, as we learned from our processor design lecture, multiplications are expensive. So let's search for a way to do the same without multiplication. Our advantage is, that we are not interested in the result of the multiplication, but only in its property of not-being-zero. Perhaps we can do something with bit shifts. First, we encode our three elements in a more dense way, using two bits at most:

```
char A[] = {1, 2, 0, 1};
char B[] = {1, 0, 0, 3};
```

After fiddling around at the whiteboard I can up with the following solution for 3 possible values plus the undefined vector, which works quite well for our use case:

```
if (((A[i] << 1) & B[i]) ^ ((B[i] << 1) & A[i]) != 0) {
compatible = false; break;
}
```

This expression implements, although it is not easily visible, the
required behavior. We can see this easily by looking at the truth
table of the function `f(a, b)=(((a << 1) & b) ^ ((b << 1) & a)) == 0`

```
| a | b | f(a, b)| | a | b | f(a, b)|
|:-:|:-:|:------:| |:-:|:-:|:------:|
| 0 | 0 | 1 | | 2 | 0 | 1 |
| 0 | 1 | 1 | | 2 | 1 | 0 |
| 0 | 2 | 1 | | 2 | 2 | 1 |
| 0 | 3 | 1 | | 2 | 3 | 0 |
| 1 | 0 | 1 | | 3 | 0 | 1 |
| 1 | 1 | 1 | | 3 | 1 | 0 |
| 1 | 2 | 0 | | 3 | 2 | 0 |
| 1 | 3 | 0 | | 3 | 3 | 1 |
```

So, this is a very limited Boolean function, since it works only for 2 bit wide A/B's. But, it is fast. And the best part is that it consists only of bit operations. This means, we can put many vector values into a single machine word and compare many of them in one step. Unfortunately we have to insert padding bits between the value bits to have zeroes that can be shifted in and out. So when we encode our example from above, we get the following bit vectors:

```
[0] [1] [2] [3] | int
A 001 010 000 001 | 641
B 001 000 000 011 | 515
---------------------------------
f(A, B) 000 000 000 010 | 2
```

As you see, the difference occurs in the `[3]`

columns, where our
values are both set, but different. With this neat trick, we can put
21(!) values in a single 64 bit word and compare them all at
once. With this optimization, I could improve the runtime of our
problem from 5 minutes to 1 minute; just to give you a qualitative
idea of the improvement for our (unspecified) problem. This stems as
well from the more densed coding (transferring less memory) and the
faster operations (bit operations are cheap).