Currently, I'm attending the FCRC Multi-Conference in Portland, Oregon. I want to write a few paragraphs about contributions I found especially interesting, and this post is more a journal for myself, than written for wider audience. But, perhaps this is interesting to others as well.
Panchekha et al. introduced Herbie, which is an heuristic optimizer for floating-point expressions to increase precision. In computing floating-point expression, the ordering and selectiong of instructions is essential for the precision of the calculation. Herbie takes an actual Rn->R function and emits a partially defined function with a minimized imprecision introduced by the selected operations.
Lopes et al presented Alive, which is an verifier for peephole optimizations in compilers. A peephole optimizations looks at the immediate representation or the machine code and replaces templates of code with faster templates of code. Alive does use the C3 theorem prover to prove the correctness of such optimizations in LLVM and found 8 bugs.
Furthermore, I learned about the existence of Vickery auctions, which is a form of auction, where the highest bid wins, but the winner does pay the price of the second highest bid. In constrast to a normal auction, this auction type does maximize the social welfare instead of the revenue. Social wellfare is defined in this setting as: the bidder with the highest need to get the item will win.
Kanev et al. presented a hardware profiling of whole datacenters. And the results are rather amazing. They profiles a bunch of Google servers for a few weeks and examined the results. It is surprising that about 30 percent of all instructions are spent in the "datacenter tax" (allocation, memmove, rpc, protobuf, hash, compression). This is really a huge number. Furthermore, they could show that pipeline stalls due to instruction cache misses contribute largely to the waiting time in those large datacenter applications. The i-cache working sets are often larger than the L2 cache; the instructions have to compete with data cache lines. Perhaps we will see computers with split L2 cache in the future.
In the DCC keynote, John Wilkes talked about cluster management at the Google datacenters. And their approach is fascinating. The basic assumption is: a machine that is not running has a speed of 0. Therefore, we optimize for availability and we assume failure to be the normal operation mode. In an EuroSys'15 paper, Verma et al talk about the Borg cluster management software, Google uses internally for its management.
During the LCTES conference,
Bardizbanyan et al.
presented a processor modification to adapt the memory-fetch stage so
it takes the need of the current memory operation into account. Not
all memory operations need all features the addressing mode
provides. For example,
mov 4(%eax), %ebx doesn't need an offset from
a register with scaling (in contrast to
niv 4(%eax, %ecx,
4)). Therefore, they propsed to gate these addressing features within
the memory fetch stage and do speculative address calculation to
improve energy consumption and latency of the stage.
Baird et al. presented a method to optimize programs for static-pipeline processors. A static pipeline is similar to a horizontal-micro instruction CPU. For a static-pipeline CPU, the compiler doesn't emit a stream of instructions, where each token is one instruction, but it splits the effects upon several commands. Each command describes what all stages of the pipeline should do in the current instruction cycle. Statically pipelined processors, are hard to program, but reveal a high energy efficiency. Baird proposed methods to optimize transfer-of-control instructions for these command-packets.
From Ghosh et al., I learned that processors that do dynamic binary translation (e.g., Transmeta Crusoe) can to speculative alias analysis. For this, the processor has some alias registers and every instructions is marked to either update of check a specific alias register. If two instructions then have an aliased pointer, the CPU faults, and the program is translated without the optimization that lead to that fault.
With Clover, Liu et al. presented an hybrid approach to mitigate soft-errors. As a hardware plattform, they used a processor with sonic micro-detectors that can detect the impact of a cosmic particle. In software, they implemented checkpointing for code-regions. Since the detector has a delay due to the physical limiation of a sonic detector, they proposed a compiler-based approach to execute the last N instructions of each code region twice in order to cover the worst-case detection delay. Although they claimed to be free of SDCs, they have strong assumptions, about their fault-model (fault occur on chip and the memory is ECC protected) and control-flow errors (there is a working software-based control-flow error detection).