Latest posts
ALP Rust is faster than C++
Mar 17, 2025 Β· by Robert Kruszewski Β· 4 min read
A recent HackerNews post titled
Zlib-rs is faster than C
was discussed in the usual way that HackerNews posts are, with one highly
upvoted comment asking:
Presumably with inline assembly both languages can emit what is effectively the same machine code. Is the Rust compiler a better optimizing compiler than C compilers?
This reminded us of our
recent work on Adaptive Lossless Floating Point compression,
and how our port from C++ to Rust actually is 20-50% faster.
ALP Compression
ALP is a cutting-edge floating-point compression algorithm designed for
efficiency and precision. The core algorithm, as outlined in the
original paper,
can be summarized as finding two exponents
e and f such that a float can be
encoded and decoded with the respective functions:-
ALPEncode(n) = πππ’ππ(π Γ 10^π Γ 10^βπ)
-
ALPDecode(d) = π Γ 10^π Γ 10β^π
Essentially, moving the decimal place of a floating point number far enough to
the right that it losslessly round-trips conversion into integer space. Many
floating point datasets actually hold psuedo-decimal values (e.g. floats rounded
to fixed significant digits), and so in practice this technique works very well.
For all values, we perform a test to check for a successful round-trip, that is
ALPDecode(ALPEncode(n)) != n. For any value that fails, an explicit βpatchβ is
stored, overriding the value during decompression.Handling Edge Cases in ALP
As always with floating point numbers, we must consider the special values of
-0.0 , inf , -inf , and NaN.Naively, negative zero actually passes our round-trip test
ALPDecode(ALPEncode(v)) != v since casting -0.0 to an integer results in
T::zero() and 0.0 == -0.0 in floating point. This can be fixed by instead
performing a bit-wise comparison.The others,
-inf, inf, and NaN , arenβt quite so easy. In the C++
standard,
casting these values to integers is undefined behavior.
Interestingly, the paper omits these details, but the published implementation
on GitHub includes a
function to handle them:C++
static inline bool is_impossible_to_encode(const double n) {
return (
!std::isfinite(n) ||
std::isnan(n) ||
n > ENCODING_UPPER_LIMIT ||
n < ENCODING_LOWER_LIMIT ||
(n == 0.0 && std::signbit(n)); //! Verification for -0.0
}
Feature-flagged
with a template variable ominously named βSAFEβ:
C++
template <bool SAFE = true>
...
if constexpr (SAFE) {
if (is_impossible_to_encode(tmp_encoded_value)) {
return ENCODING_UPPER_LIMIT;
}
}
Enabling this function, the C++ implementation can explicitly handle problematic
values, ensuring a correct round-tripping test.
Leveraging Rust's Semantics
In Rust, the semantics for
float-to-int casting
are well-defined.
-0.0 and NaN cast to 0, inf casts to T::MAX and
-inf casts to T::MIN .By combining this with the earlier trick of performing bit-wise comparison, our
Rust implementation can avoid the undefined-behavior checks required in C++,
while still correctly testing whether a value round-trips the encode and decode
functions.
Combined, this results in 20% to 50% increase in compression throughput
(depending on frequency of patched values).