A thorough examination of the performance effects of using undefined behaviour in compiler optimizations.
Method:
- Modifying clang to not use UB where this is possible
- Run a large suite of benchmarks on different architectures, compare results for modified and unmodified clang
- Do statistics on the results
- Examine performance deviations
- Discuss factors which could bias results.
Very good science!
Result in short:
Only on ARM and if no link-time optimization is used, a systematic small positive performance effect can be seen. For Intel and AMD CPUs, there are no systematic improvements.
Average effects are typically below 2%, which is the typical effect of system and measurement noise. Often, effects are even negative. In some cases, benchmarks show large differences, and many of these can be fixed by simple modifications to the compiler or program.
For me, this result is also not too surprising:
If allowing / using Undefined Behavior (UB) would allow for systematically better optimizations, Rust programs would be systematically slower than C or C++ programs, since Rust does not allow UB. But this is not the case. Rather, sometimes Rust programs are faster, and sometimes C/CC++ programs. A good example is the Debian Benchmark Game Comparison.
Now, one could argue that in the cases where C/C++ programs turned out to be faster than Rust programs, that at least im these cases exploiting UB gave an advantage. But, if you examine these programs im the Debian benchmark game (or in other places), this is not the case either. They do not rely on clever compiler optimizations based on assumptioms around UB. Instead, they make heavy use of compiler and SIMD intrinsics, manual vectorization, inline assembly, manual loop unrolling, and in general micro-managing what the CPU does. In fact, these implementations are long and complex and not at all idiomatic C/C++ code.
Thirdly, one could say that while these benchmark examples are not idiomatic C code, one at times needs that specific capability to fall back to things like inline assembly, and that this is a characteristic capability of C snd C++.
Well, Rust supports inline assembly as well, though it is rarely used.