Don't forget <tuple> not mandated to be trivial when it can be (and in fact never being trivial with GCC's stdlib), std::print performance issues (see P3107R5, P3235R3), etc.
Heck, even std::atomic was designed with only x64 in mind (it clearly shows), and is unusable outside it. One is incentivized to write their own "atomic" class until P3330R0 is approved for RMW-centric platforms ISAs like Aarch32 and Aarch64.
The idiomatic way to do RMW (outside simple stuff like fetch-increment) with std::atomic maps 1:1 with x64 assembly and since fetch_update isn't provided, it's the only way to do it. It's way too close for comfort. See [1] for a comparison
> Total hyperbole. It's perfectly usable on ARM and other platforms.
It's not hyperbole. std::atomic is portable, but that's all it is.
std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm (or custom reimplementations that provide fetch_update -- same thing). See [2] for a benchmark.
Yes, the design of std::atomic probably favors x64 in certain areas. However, you initially claimed that std::atomic has been designed with only x64 in mind. This is simply not true, which is easily proven by the fact that they explicitly support weak memory models.
> std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm
Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.
I appreciate your insight, but it could have been delivered with less hyperbole.
Sure, but memory ordering is orthogonal to LL/SC vs CAS.
To me, fetch_update not being present from std::atomic's inception is major design oversight as CAS can be emulated via LL/SC but not the other way round.
Furthermore, fetch_update code is easy to read and less awkward to write than CAS loops (which currently are the only way std::atomic offers, and this is what I'm complaining about)
> Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.
I disagree. Atomic variables (atomic instructions) are usually used to implement synchronization primitives, and are thus often meant to be used in very hot paths. 30% perf drops are actually quite bad, in that regard.
Of course if one is restricting themselves to using only member methods (fetch_add, fetch_or, etc.), then all is fine because these methods are optimized.
All in all, C++'s stdlib (the parts that aren't just __builtin wrappers, to be precise) is actually quite fine for most use-cases, like PC applications. Indeed, it is when one has latency constraints and/or severe memory constraints (e.g. < 512 KiB) that the stdlib feels like a hindrance.
> Sure, but memory ordering is orthogonal to LL/SC vs CAS.
Sure, but your original claim was that std::atomic has been designed with only x64 in mind. That's what I meant to argue against.
I agree that the omission of something like fetch_update() has been an oversight and I hope that it will make it into the C++ standard!
As a side note, here's what the Rust docs say about fetch_update():
> This method is not magic; it is not provided by the hardware. It is implemented in terms of AtomicUsize::compare_exchange_weak, and suffers from the same drawbacks.
Looks like their (Rust) main motivator was readability. Whereas P3330R0 has that + performance on non-CAS hardware in mind. In any case, Rust's function could be optimized in the future, if they decide on it.
Heck, even std::atomic was designed with only x64 in mind (it clearly shows), and is unusable outside it. One is incentivized to write their own "atomic" class until P3330R0 is approved for RMW-centric platforms ISAs like Aarch32 and Aarch64.
And of course, Rust already has "fetch_update"...