Don't forget <tuple> not mandated to be trivial when it can be (and in fact neve...

spacechild1 · 2025-06-02T11:02:46 1748862166

> Heck, even std::atomic was designed with only x64 in mind (it clearly shows),

It certainly wasn't.

> and is unusable outside it

Total hyperbole. It's perfectly usable on ARM and other platforms.

P3330R0 looks a nice addition, though.

TuxSH · 2025-06-02T12:21:37 1748866897

> It certainly wasn't.

The idiomatic way to do RMW (outside simple stuff like fetch-increment) with std::atomic maps 1:1 with x64 assembly and since fetch_update isn't provided, it's the only way to do it. It's way too close for comfort. See [1] for a comparison

> Total hyperbole. It's perfectly usable on ARM and other platforms.

It's not hyperbole. std::atomic is portable, but that's all it is.

std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm (or custom reimplementations that provide fetch_update -- same thing). See [2] for a benchmark.

[1] https://godbolt.org/z/EasxahTMP

[2] https://godbolt.org/z/Y9jvWbbWf

spacechild1 · 2025-06-02T13:39:08 1748871548

Yes, the design of std::atomic probably favors x64 in certain areas. However, you initially claimed that std::atomic has been designed with only x64 in mind. This is simply not true, which is easily proven by the fact that they explicitly support weak memory models.

> std::atomic is about 30% to 40% (with outlined atomics on, which is the default) slower than handrolled asm

Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.

I appreciate your insight, but it could have been delivered with less hyperbole.

TuxSH · 2025-06-02T14:36:55 1748875015

Apologies for the style of my previous messages.

> they explicitly support weak memory models.

Sure, but memory ordering is orthogonal to LL/SC vs CAS.

To me, fetch_update not being present from std::atomic's inception is major design oversight as CAS can be emulated via LL/SC but not the other way round.

Furthermore, fetch_update code is easy to read and less awkward to write than CAS loops (which currently are the only way std::atomic offers, and this is what I'm complaining about)

> Only for certain CAS operations. A 30% or 40% performance penalty doesn't sound too dramatic and certainly makes it "usable" in my book.

I disagree. Atomic variables (atomic instructions) are usually used to implement synchronization primitives, and are thus often meant to be used in very hot paths. 30% perf drops are actually quite bad, in that regard.

Of course if one is restricting themselves to using only member methods (fetch_add, fetch_or, etc.), then all is fine because these methods are optimized.

All in all, C++'s stdlib (the parts that aren't just __builtin wrappers, to be precise) is actually quite fine for most use-cases, like PC applications. Indeed, it is when one has latency constraints and/or severe memory constraints (e.g. < 512 KiB) that the stdlib feels like a hindrance.

spacechild1 · 2025-06-02T15:35:15 1748878515

Thanks for the leveled response!

> Sure, but memory ordering is orthogonal to LL/SC vs CAS.

Sure, but your original claim was that std::atomic has been designed with only x64 in mind. That's what I meant to argue against.

I agree that the omission of something like fetch_update() has been an oversight and I hope that it will make it into the C++ standard!

As a side note, here's what the Rust docs say about fetch_update():

> This method is not magic; it is not provided by the hardware. It is implemented in terms of AtomicUsize::compare_exchange_weak, and suffers from the same drawbacks.

https://doc.rust-lang.org/std/sync/atomic/struct.AtomicUsize...

So Rust's std::sync::atomic is equally "useless"? :)

TuxSH · 2025-06-02T16:32:07 1748881927

Heh, you're right, good catch: https://godbolt.org/z/3cEfbqM51

Looks like their (Rust) main motivator was readability. Whereas P3330R0 has that + performance on non-CAS hardware in mind. In any case, Rust's function could be optimized in the future, if they decide on it.