More

jstimpfle · 2026-06-15T23:36:31 1781566591

A bug is a bug even when it doesn't clearly manifest itself 100% of the time, and furthermore it is pretty much guaranteed that NULL dereference crashes with segfault in practice, only not for the people playing theoretic games whose essence of life is finding gotchas where it maybe isn't so and then feeling smarter than everyone else.

But it's >> 99.9% true that this will just crash even though it's acshually UB, nasal demons and so forth. Now raise this << 0.1% likelihood that it isn't true on some system with some compiler and build flags, to the power of the number of distinct deployed configurations out there, and you get the result which is the correct engineering decision of just moving on instead of spending your life filling straightforward code with pointless boilerplate assertions.

NB it can make sense to assert nonnull when the condition won't be tested on all code paths or the intention is otherwise not super obvious.

lmm · 2026-06-16T01:00:12 1781571612

> it's >> 99.9% true that this will just crash even though it's acshually UB, nasal demons and so forth.

Is it though? Linux saw enough bugs from that kind of issue that they now build with -fno-delete-null-pointer-checks and accept the (supposed) performance penalty.

uecker · 2026-06-16T04:56:03 1781585763

The kernel is perhaps bit special. In the past they had bugs such as first derferencing and then checking for null and weird possibilities to map the zero page. But today I am not convinced this is really needed.

In general on a system where you trap when accessing the zero page, this optimization should be safe and a null pointer dereferences should (safely) trap.

lmm · 2026-06-16T05:12:33 1781586753

> In general on a system where you trap when accessing the zero page, this optimization should be safe and a null pointer dereferences should (safely) trap.

If you mean that C compiler writers "should" prioritise sanity over high scores on microbenchmarks, then I agree. However in practice they do not and this optimization is not remotely safe.

uecker · 2026-06-16T05:14:13 1781586853

Do you have any evidence for this? On GCC it should be safe.

(EDIT: what is not safe is indexing into a null pointer. For this you need to be safe you need -fsanitize=null)

lmm · 2026-06-16T06:42:09 1781592129

I don't understand your comment - dereferencing a null pointer is unsafe, in the sense that it does not reliably crash but may do other things, as we saw in the kernel case we're talking about. Yes that particular case was only exploitable if you mapped the zero page, but given how all-bets-are-off a situation it created (where extremely experienced programmers thought they knew what the code did, thought it was safe, and were wrong), I would not want to count on all cases not being exploitable without mapping the zero page.

jstimpfle · 2026-06-16T06:59:04 1781593144

May. If. If. If. In case.

We are talking about an extremely simple straightforward API with an obvious contract. It's good enough for this function to reliably surface almost all wrong uses with a segfault immediately. Wrong use will result in segfaults and otherwise bugs and crashes. The goal is not to work when used wrong but to work when used right. You cannot save the world from scratch in every little function. You still have a job to get done, and you have to move on.

lmm · 2026-06-16T07:17:36 1781594256

> You cannot save the world from scratch in every little function. You still have a job to get done, and you have to move on.

Or you can take all of 10 minutes to put sanity-check assertions at the start of all your public-facing API functions, eliminating a source of security bugs, get on with your life, and worry about the performance implications as and when it becomes a problem (hint: it's never going to become a problem).

jstimpfle · 2026-06-16T09:24:41 1781601881

You can try and do this if it's a relatively narrow public facing API, but otherwise this is a theoretic ideal. In practice, if you add an assertion for every pointer argument to every little function, you'll go insane, and it is completely pointless, and the code will not be readable anymore.

There are so many other interesting and relevant invariants that are usually in an API contract that are much harder or impossible to check upfront (let alone express formally in a type system), and even violations may be impossible to diagnose when they happen.

People focus on NULL because that's the only way they can apply their silly limited type systems. But NULL checks give very little return for investment. In practice, you'll see templated Option<T> types and whatnot, and when I have to look at or even work with such code I want to kill myself because it's so painful.

dwattttt · 2026-06-16T08:14:15 1781597655

It takes a lot longer to figure out if it'll be a problem than to just add the check. And you don't have to ponder whether it's possible for a null to get there, because now it's fine if it does.

jstimpfle · 2026-06-16T09:35:02 1781602502

Are you talking about extending the API contract to allow for NULL? That is often the path to madness, especially if it requires complicating the signature (return value etc). Better to just assert/crash.

dwattttt · 2026-06-16T11:13:52 1781608432

No. I'm talking about adding the check to reject NULL. Then you don't have to spend time justifying or figuring out why a NULL can't turn up here.

jstimpfle · 2026-06-16T11:44:23 1781610263

So reject as in assert? But how does that go together with what you said, "because now it's fine if it does"?

Dylan16807 · 2026-06-15T23:41:31 1781566891

I don't want to nitpick people often but your use of division sign to mean percent is really throwing me off.

jstimpfle · 2026-06-15T23:45:59 1781567159

Thanks for letting me know, nitpick appreciated. Typing on my phone.

jstimpfle · 2026-06-15T18:37:22 1781548642

I told him the same thing multiple times, just open a random code location in GCC, or a recently committed change, and see that it's basically C. But he keeps repeating that ridiculous argument like a broken record.

jstimpfle · 2026-06-15T13:36:17 1781530577

It's totally true, using sizeof like a function is one of my pet peeves. Even the kernel people do it but it's WRONG and you are right.

But ACSHUALLY, how you write allocation is like this

    #define sane_alloc(type, count) ((type *) malloc(sizeof (type) * (count)))

    game->boardPieces = sane_alloc(BoardPiece, row * column);

The kernel people seem to finally have figured out this one in 2026.

unwind · 2026-06-16T09:10:17 1781601017

Nah I'm still against repeating the type name all over the place, and the cast adds nothing good imnsho.

jstimpfle · 2026-06-16T09:38:24 1781602704

The cast is at least 50% of why this is useful! You'll now get compile errors in case you did anything wrong.

DonHopkins · 2026-06-15T17:23:01 1781544181

Nothing is sane in a language that lets you say 4["Foo!"]

Array indexing in C is just pointer arithmetic wearing Groucho Marx Glasses.

C combines the flexibility and power of assembly language with the user-friendliness of assembly language.

jstimpfle · 2026-06-15T18:44:09 1781549049

> Nothing is sane in a language that lets you say 4["Foo!"]

I just had a look at your HN profile page and was struck by the irony of seeing your Forth vs Lisp vs Postscript code examples there. Now consider that I've never written code like 4["Foo!"], even though I know it's possible, but in other languages you constantly have to do mental gymnastics to get any real work done, and those are allegedly so much saner !???

DonHopkins · 2026-06-15T19:38:07 1781552287

When they were handing out brains, I though they said suᴉɐɹq, and I said "180 rotate".

jstimpfle · 2026-06-15T13:31:56 1781530316

struct layout is well specified, it should be possible to avoid any padding issues by just aligning and by padding (with dummy members) correctly. The problem in practice is mostly integer representation (big-endian vs little-endian).

leni536 · 2026-06-15T13:36:31 1781530591

Specified by whom? Not the C standard for sure. It is indeed soecified by individual ABIs, and ABIs don't tend to do anything too weird, but that's another question.

jstimpfle · 2026-06-15T14:46:17 1781534777

looks like I was wrong, but here is the de-facto standard I was relying on over the years ;-). Not that I've memcpied many structs to file directly btw. http://www.catb.org/esr/structure-packing/

jcranmer · 2026-06-15T16:01:23 1781539283

The general struct layout algorithm is that you lay out the first member at the address of the struct (this is guaranteed by C), and then subsequent fields in order (also guaranteed by C). What isn't guaranteed is how fields get their alignment, in particular shenanigans you can do with allocating fields in the padding of their prior field, and bitfields in general are horribly underspecified.

In practice, C doesn't do any padding shenanigans, but C++ does (but only for non-POD structs, and then you discover there's several slightly different definitions that mean basically "POD", so have fun predicting which one is the one that actually matters for your use case).

RossBencina · 2026-06-15T21:24:30 1781558670

If you sort your fields by size or manually pad them with natural alignment, and use #pragma pack or equivalent non-standard directives that gets you most of the way there. But yes, avoid bitfields.

C++ "standard layout type" is the modern equivalent of "POD" I think.

flohofwoe · 2026-06-15T19:15:13 1781550913

> struct layout is well specified

Technically that's not true at least for booleans and enums, the C standard doesn't define specific sizes for those (bools are commonly 1 byte though, but for enums at least MSVC likes to disagree with Clang and GCC).

Using a direct struct memory layout for persistency and then expecting it to work across compilers, CPUs and ABIs is almost guaranteed to cause problems.

DanielHB · 2026-06-15T14:40:53 1781534453

If you modify or even just move fields around the struct that also changes the way they are serialized...

You really need a serializer for this sort of thing because it can also include forwards compatibility of your data structures.

jstimpfle · 2026-06-15T14:45:04 1781534704

sure, if you change the struct, it will now be different.

edflsafoiewq · 2026-06-15T19:17:41 1781551061

It's typical to only append fields when you do this.

jstimpfle · 2026-06-13T16:54:21 1781369661

Trust me, I know more C++ than most or all of my peers (working two jobs simultaneously), and I know a million ways that C++ features suck. Also standard library and containers. If you want I'll point out the ways in which std::deque, and even std::map, std::unordered_map, even std::vector (!) suck. IMO, just don't do it.

rfgplk · 2026-06-13T16:57:53 1781369873

The standard library implements really do suck (in some cases), but this should be separated from C++ (the language). Even the standard splits the language grammar from the standard library cleanly.

wavemode · 2026-06-13T17:10:48 1781370648

You can't really separate the two, firstly because some parts of the standard library interact directly with the language's syntax (e.g. <initializer_list>), and secondly because the language standard dictates things about the behavior of the standard library that limit implementation options.

For example, the standard says that adding elements to an <unordered_map> is not allowed to invalidate references to keys or elements within the map. That makes it impossible for any standards-compliant C++ implementation to use a high-performance implementation in which keys and elements are stored contiguously in a flat array.

nly · 2026-06-13T18:41:13 1781376073

Your map example only concerns the standard library, not the language.

wavemode · 2026-06-13T19:58:51 1781380731

Its behavior is dictated by the language.

The context of this thread is that someone stated that the C++ standard library sucks, and someone replied to them saying that it's just some implementations that suck, but that's separate from the language. The point I'm trying to make, in response, is that it is about the language. It's not just "some" implementations - there is no implementation of the C++ standard library that doesn't have these inefficiencies, because the language's own standard requires them.

(This is tangential but - this is why I often say that C++ is not actually the most complex language in the world, it's just over-specified. If you took almost any popular programming language and wrote a document dictating the behavior of every single feature and library to the same level of detail, you would end up with a document similar in length or even longer than the C++ standard.)

jstimpfle · 2026-06-13T21:44:45 1781387085

In my reading, they didn't say it's due to bad implementations, though. They were trying to separate the standard into two parts, the one about the language syntax and semantics, and the one about the standard library. And I think this is a fair separation actually. But that doesn't make the core language any better ;-)

wavemode · 2026-06-14T01:57:19 1781402239

Hm. You're probably right.

gpderetta · 2026-06-13T17:43:25 1781372605

Which sucks... unless you really need reference stability.

jstimpfle · 2026-06-13T19:27:52 1781378872

std::vector has left the chat.

stinos · 2026-06-13T19:26:40 1781378800

and even std::map, std::unordered_map, even std::vector (!) suck

It's really hard to take your comment serious because of generalization like this. Maybe they're not usable for your particular usecase but that doesn't mean they suck. Just like there's a 'million' ways that C++ sucks in your book, there's a reason there's millions of lines of code out there where these containers are valid usecases and hence work without issues whatsoever nor a need to replace them with something else.

jcranmer · 2026-06-13T20:17:16 1781381836

std::map and std::unordered_map are just unbelievably shitty implementations. The former is a red-black tree, which in my entire programming career I have needed to reach for like... twice? It's just not the right container for almost any problem you have, yet it's the one that gets the short, sweet name. The latter is a bucket-based hashmap, which is about the worst kind of hashmap that can be built. On top of that, their APIs are also really annoying to use compared to, say, Python or Rust's implementation. At least C++20 finally added a simple contains method, but something like setdefault is just a chore to get implemented.

tptacek · 2026-06-16T13:45:21 1781617521

Not really apropos any of this but a vivid career memory for me is leaving a startup that was all in C++ and going to Arbor Networks, which was all C, and porting the STLport map to rbtree.c and making it our default container. No, I have no idea why I did that. Different times!

WalterBright · 2026-06-13T23:38:10 1781393890

Pretty much the only collections I use are:

1. arrays

2. linked lists

3. hash tables

4. simple binary trees

jstimpfle · 2026-06-13T19:39:23 1781379563

They're not useable for anything serious, i.e. high throughput, low frequency, massively concurrent work. In other words, most of the things for which you shouldn't better have chosen a different language in the first place.

They're also unusable by the way because of ergonomic and software architecture factors, such as bad modularity, terrible compile times, unreadable error messages, unreadable symbol names...

Yes that is overgeneralizing a little bit but it's largely true.

The problem is typically not the containers themselves but all the other bad decisions that they push you to make in order to work around their "small issues".

The huge problem is that these containers can get you started quickly, i.e. leetcode type stuff and single threaded stuff, but at some point you'll realize your architecture ended up completely in the wrong place because of that.

If you haven't been thinking deeply about memory management and concurrency, you won't be able to understand, no offense meant. I've just fixed another subsystem that was completely overwhelmed, seeing 8x bandwidth gains already on a small testsystem, but the factor is basically unbounded when moving to bigger systems, when it's about contended vs uncontended.

delta_p_delta_x · 2026-06-13T20:20:39 1781382039

> anything serious, i.e. high throughput, low frequency, massively concurrent work

Why is only 'high throughput, low frequency, massively concurrent work' considered 'serious'?

Pannoniae · 2026-06-13T22:33:08 1781389988

They're just clearly inferior in pretty much any situation.

The map stuff the other posters summed up well but even std::vector is dogshit with pretty much all implementations having inlined grow code in push_back, a not too great API and missed optimisations e.g. no trivial relocation when growing the vector / moving it and no useful APIs such as "grow but don't initialise"...

jstimpfle · 2026-06-13T22:45:29 1781390729

To be fair grow-but-don't-initialize is a pretty fundamental part of the API, the reserve() method.

But already the basic premise that you should push back without thinking is wrong. You will suffer reallocations and invalisations when you least expected them, and frankly you have to architect around that fact which is a terrible restriction. You can work around by pre reserving but at that point it's just a basic fixed heap allocated array but worse because the type gives you a weird look all the time, "I'll realloc as soon as you don't pay attention, harhar"!

tialaramex · 2026-06-13T23:43:39 1781394219

std::vector lacks what I call the "bifurcated reservation API". It has Rust's Vec::reserve_exact but not Vec::reserve -- these APIs serve subtly different purposes, the former (which C++ calls just "reserve") says "Here's a hint for exactly how big this container will ever grow" while the latter says "Here's a hint for how big this container will get in my immediate future, but it might grow further later".

The implementation always tries to grow (if necessary) to the exact size chosen for Vec::reserve_exact, but for plain Vec::reserve if growth is needed it always grows exponentially, not to the exact size, preserving the O(1) push cost.

For a typical "doubling" growable array type, if we're pushing groups of ten items, reserve_exact or C++ grows like 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ... which is much worse than O(1) whereas the correct reserve grows 10, 20, 40, 80, 160 preserving O(1)

For trivial types you can work around this in C++ with a little work, and for the non-trivial types you can work around it with a bunch more extra code, but you probably won't.

Bjarne among other people teaching C++ recommend just not using the reservation API as a reservation API because of this problem†, and the resulting teaching definitely leaks into CS graduates and even into languages which have the correct API and so you have to un-teach the bad lesson.

In applications where you actually can't afford to pay for growth (or at least in some cases can't afford this) I also like Vec::push_within_capacity which I believe comes from Rust for Linux where the kernel legitimately needs this "If there's room push, otherwise I have a plan B" approach.

† To Bjarne this API is instead conceived of as a way to preserve reference validity. Since it won't grow, our references will still work.

jstimpfle · 2026-06-14T06:53:08 1781419988

Vec::reserve() is the behaviour you get from C++ std::vector push_back() (implicit just-in-time reserve), so right now I can't see a situation where I'd want the explicit Rust version even if I didn't think the whole push realloc thing is mostly a bad idea.

Yes, Rust version allows you to maybe skip a reallocation step or two by doing explicit up front reallocation. But remember most allocation work is always from the last grow anyway. The Rust version seems like a microoptimization, giving a little bit more explicit control in a situation where you've already pretty much given up control and gone like, throw hands in the air, we're doing push_back()!

tialaramex · 2026-06-14T10:45:00 1781433900

As with the likely/ unlikely branch hint the problem is that programmers are wrong much more often than they expect on both sides. They're too often wrong to think they know the final size - hence Bjarne's caution - but they're also too often wrong that they've got no idea how much capacity they need at all. So hence this API.

You're correct that this isn't a huge optimization. But it more than pulls its weight directly because it's a small boon when you're right and it doesn't have the terrible penalty that Vec::reserve_exact has when inevitably the programmer is sometimes wrong. It's very much about saving pennies, but the growable array type is so widely used that counting pennies makes sense.

I have a lot more thoughts about reservation, but these suffice for specifically the growable array type.

jstimpfle · 2026-06-14T19:36:53 1781465813

There are many many situations where I know exactly what is the common path and what is the uncommon path. And when I don't know, it's much harder to optimize!

If you're counting pennies, Vec::reserve() (inexact) is hardly what you want, because in the worst case you're wasting a factor of 1.5x or 2x of elements due up-front overallocation. Maybe chunk lists could be better, overhead is bounded by chunk size and all operations are constant-time. No pointer invalidation either. And you can pool those chunks, preventing memory fragmentation and improving memory utilization, since there aren't a million different sized allocations in your process.

tialaramex · 2026-06-15T07:42:10 1781509330

> If you're counting pennies, Vec::reserve() (inexact) is hardly what you want, because in the worst case you're wasting a factor of 1.5x or 2x of elements due up-front overallocation.

I think this is "Trump math" but assuming we're actually looking at the same over-allocation this isn't new, what you're calling "wasting a factor of 2x" was already what you were paying by using the amortized O(1) growable array type, that's its central trade off, so yeah, the type with that property does still have that property.

> Maybe chunk lists could be better, overhead is bounded by chunk size and all operations are constant-time.

Basically std::deque. Surely no more need be said ?

jstimpfle · 2026-06-15T08:14:12 1781511252

> was already what you were paying by using the amortized O(1) growable array type

As I said, I don't think growable array data structure is a good idea.

> Basically std::deque. Surely no more need be said ?

Ok now you're really out of your depth here. Pretty sure you haven't actually used it, it's a somewhat obscure data structure even though the more serious C++ programmers have heard of it, most have probably never used it.

From a web search, std::deque seems to be pretty universally thought of as a weird and very bad data structure. I've had to learn myself because I did actually try to use it myself recently. Beyond being unintuitive to use due to being "abstract" (had 2-3 serious bugs, e.g. unexpected iterator invalidations that happened only under load), apparently std::deque is not even specified as a chunk list. O(1) random access must be guaranteed. So it must be much more complicated than that, I don't know the details though.

And while actual implementations do use chunks in some way (just not in a simple chunk list), apparently "chunks" is not a part of the std::deque spec, and hence there isn't anything standardized about the size of these chunks either.

- On MSVC, chunk size is 16 elements so chunk size of std::deque char is 16 BYTES !!!!! I was thinking they can't be serious, but apprently it is the case

- On GCC, chunk size is 512 byte fixed.

- On Clang, chunk size is 4096 byte fixed.

It _is_ a shithole, surely no more need be said?

tialaramex · 2026-06-15T13:14:02 1781529242

> As I said, I don't think growable array data structure is a good idea.

It is still quite hard for me to take this seriously as a belief.

> Ok now you're really out of your depth here.

I don't think so. As you found, the consensus is that this sort of thing is a terrible idea. The O(1) complexity looks good superficially but your actual performance is miserable.

> I don't know the details though.

Raymond Chen has a pretty good explanation of the three implementations of std::deque, like he did for their std::string but Raymond's goal is to help you do forensics, he's not here to tell us which data structures are a good idea. https://devblogs.microsoft.com/oldnewthing/20230810-00/?p=10... might work, or, given Microsoft, it might already be dead by the time you click it.

jstimpfle · 2026-06-15T14:47:58 1781534878

> The O(1) complexity looks good superficially

Without knowing more about the details of the spec and of real world implementations [0], I'll boldly claim that the O(1) is the precise problem why this is a weird data structure (I never needed or expected that sort of constraint). It is not a chunk list, like many (including me) seem to have assumed.

I don't even have performance problems measured with this, also given that my std::deque is not currently in production, it is used as a simple streaming queue that only has a couple megabytes/second of bandwidth requirements, and it is not being built with MSVC (so chunks are going to be 512 or 4096 bytes). But it is a latent problem.

[0] Btw. I don't want to know the messy details of this right now, I need simple not complex. I've learned not to add more complexity in a futile attempt to fix things that are fundamentally broken. What use case is this going to be solved by std::deque, name me one such use case and tell me with a straight face that it's not a completely made up case or a super niche case that would likely also be better served by a different approach.

Or take it from someone else if you would, is Chandler Carruth high-profile enough and free enough of conflict of interest to be credible for saying that std::deque is dumb? https://qqrl.tk/item?id=22962980

tialaramex · 2026-06-15T15:52:35 1781538755

> It is not a chunk list, like many (including me) seem to have assumed.

The thing C++ programmers tend to expect (though perhaps not you) is Rust's std::collections::VecDeque - the growable array type again but used as backing for a ring buffer.

This type has amortized O(1) push and pop at both ends, I assume since you didn't like the amortized growth of Vec / std::vector you'll feel the same way but that's what most programmers actually want from such a type - in use as a queue it won't repeatedly cycle allocations so long as the size is constrained, because it's a ring buffer. If you actually know the needed size up front the overhead from this structure, which can grow, versus a structure which can't grow, is a single integer for the capacity.

But as you saw, std::deque is... not that. STL wasn't able to explain what it's for when I asked him, so, if anybody knows it apparently doesn't include the people maintaining the C++ standard library implementations.

> Or take it from someone else if you would

I think the trouble here is that you've misconstrued this entire part of the conversation. I was gesturing at std::deque because it's terrible, and your response, that it's terrible, isn't a rebuttal as I think you're expecting. We agree on that, std::deque is terrible, our disagreement seems to be that you think a slightly different terrible data structure would be a good idea somehow and I do not.

jstimpfle · 2026-06-15T18:17:43 1781547463

I don't know the size up front because it goes up and down depending on load. I want to be able to buffer jitters/spikes but not hog memory when there is no data in the queue. I was assuming it's natural to want a simple linked list of chunks to represent this, maybe chunks can be partially full here. So chunk -> chunk -> chunk, and each chunk is actually a node with a buffer pointer + size + fill fields, or maybe the data comes directly after the size + fill fields so no buffer pointer needed.

That easily gives O(1) access to enqueue/dequeue, and the queue naturally grows and shrinks (even within configurable bounds) as part of reading and writing. Blocks can be taken and returned from/to some fixed size block pool. This design would be completely satisfactory. It's also very easy to code from scratch, so why should I settle for anything less, for example a growable vector where a reallocation would mean blocking other threads for extended time? But I was told to "not reinvent the wheel" and to "not overengineer it" so I tried to change my mind and make use of an STL container data structure.

> our disagreement seems to be that you think a slightly different terrible data structure would be a good idea

I don't know why you think that I would be thinking that, and I'm confused that you say you were gesturing at how std::deque is terrible, and not sure why I should be the one misconstruing something, when my premise from the beginning was "STL containers are terrible in many ways". And I don't get at all why you responded with "basically std::deque" to my "maybe chunk lists would better". It doesn't make the least bit of sense to me.

But let's settle this here, this is a pointless fight...

tialaramex · 2026-06-16T11:13:29 1781608409

> I was assuming it's natural to want a simple linked list of chunks

In the 1970s or 1980s that does feel entirely natural, but in the 1990s the 68040 and i486 both introduce L1 data cache and so now all list chasing hurts very badly, your structure is a list chase any time we index into the collection.

I think I can see a way to have what you're describing hit amortized O(1) push/ pop with the spikes which are amortized being more frequent (linear with capacity) but fixed size and smaller (allocate or free one block then do some housekeeping), it costs more RAM because of the block overhead and it is no longer contiguous, but I see that for your intended application you probably don't care about either problem.

Now that I think I understand your data structure better it's much less similar to std::deque than I had originally thought, it does seem very niche to me, but more power to you if you write such a type.

jstimpfle · 2026-06-16T11:55:11 1781610911

Yes, I am talking about a buffer for an outbound queue that should buffer on the order of MB/sec. That's decidedly not a niche thing. It's very basic systems engineering (and also how socket buffers are represented in a kernel for example).

You would chunk this at a reasonable size, at least multiple kilobytes per chunk. That is not slow, it is basically the only way to do it. Yes, that is amortized, bounded overhead. Sure it costs a bit of extra RAM, like one extra pointer and maybe 1-2 integers per chunk? For chunks of 4 KB, this overhead is predictably less than 1 % plus less than 1 chunk worth of fragmentation in the last chunk. That is NOT unpredictable spikes of > 100 percent during reallocation or 100000 percent due to low utilisation...

I believe you can even instruct the CPU to preload the next buffer while still streaming the last, but personally have never had the need to dive _that_ deep. It would probably be very hard to measure any performance benefit from doing so. I like to write very basic straightforward C code, just solve the data structure problem first, not hand-wave it.

jstimpfle · 2026-06-14T06:31:53 1781418713

Feels like typical improvement/perfection tunnel vision syndrome to me, though. That syndrome is engrained in C++ community and I think also Rust community, although it looks like the latter took the chance to do many things better from the start learning from C++'s mistakes.

Realloc is pretty much never the right way in whatever form, and I've never seen any need to include realloc in any of my own allocators (mostly blockallocators and linear allocators and pools/free lists, sometimes using malloc/free).

jstimpfle · 2026-06-13T21:26:33 1781385993

You are free to make your own definition, what are your suggestions?

(Obviously I meant to say low latency, not low frequency)

delta_p_delta_x · 2026-06-14T10:05:49 1781431549

That is a cop-out. You made the assertion, you define it.

What about these particular workloads (and the environments they're used in) make them 'serious' and why are other workloads 'lesser' and therefore the standard library 'suffices'? Why not use better containers for everything? Google, for instance, universally recommends Abseil.

jstimpfle · 2026-06-14T19:06:38 1781463998

I stand by my statement. You are welcome to contend it, but then please actually suggest alternative workloads that could qualify as serious?

Google also recommends using Golang in many cases, which was explicitly designed for not-so-experienced people. It is more geared towards creating services quickly and robustly, not towards squeezing out the last 10x of performance.

I can't say anything about abseil, having never used it. "By Google" is not an immediate seal of quality, and it doesn't mean that one size fits all. I've recently seen a list of .so objects required in order link grpc (also by Google), there were like dozens of abseil .so's in the list IIRC. I don't like it! In my experience, validated countless times in my own practice, complexity => slow and hard to maintain and simplicity => fast and easy to change.

Templatized C++ containers are generic recipes, they can't make use of local context and hence don't lead to the simplest solutions. They produce tons of boilerplate. They give you a decent single-threaded baseline performance in many cases, sure, but now start hammering these data structures using 16 CPUs in parallel... you might find that you should have designed your application completely differently.

Like WalterBright says, I too use very simple data structures almost exclusively (arrays, linked lists, linear allocation). Most of these coded ad-hoc everywhere, very straightforward, very adaptable. Even hash maps I only use infrequently -- if I can, I just make keys such that I can index directly. I get performance from knowing exactly what happens, and minimize the work that needs to happen. I create immutable data objects (write-once) where possible. I've created my own pooled block-allocator for power of 2 sized allocations with book-keeping in shadow-memory (preventing fragmentation), massively reducing syscalls that modify virtual memory mappings and closely controlling memory used by various subsystems.

I don't expect a library like abseil to solve many of my problems, it probably makes some things "easy" by taking away the very control from me that I need to solve the problem I have, resulting in unsolved problems and complexity.

miroljub · 2026-06-13T22:13:42 1781388822

Because if you don't need any of these, any slop implementation will do.

stinos · 2026-06-14T13:59:50 1781445590

If you haven't been thinking deeply about memory management and concurrency, you won't be able to understand, no offense meant

I do think about that, when needed. My point is that these containers can be 'good enough' in places where it doesn't really matter, not that they're always the go-to thing. E.g. I really don't see any issue using a map as part of a configuration type of object which gets read from args or json and which only gets used once at application start.

jstimpfle · 2026-06-15T07:48:35 1781509715

Yes. When you don't have any specific requirements, any mediocre data structure will do.

jstimpfle · 2026-06-13T16:24:29 1781367869

Orthodox C++, to me, is C plus the one good feature of C++: you don't have to type struct all the time.

maxvu · 2026-06-13T16:47:33 1781369253

How about one of the C unorthodoxies that use typedef everywhere? (Namespaces seem suitable, too.)

jstimpfle · 2026-06-13T16:52:14 1781369534

typedef is a little bit of a hassle but you can do it, even in a very strict mechanical way if writing plain C. But it's a hassle.

And namespaces suck too, so much noise for little gain. You know what, a big part of programming is naming. You just have to come up with good names. Namespaces don't magically make names better, if anything, they make them worse. And they add a lot of syntax noise.

PaulDavisThe1st · 2026-06-13T20:00:52 1781380852

tell me you never use platform-provided or third party libraries without actually telling me.

jstimpfle · 2026-06-13T21:36:06 1781386566

I won't, because obviously I have. Coincidentally, I've just spent more than a day to find a sane way to setup basic windowing and rendering on Windows for the 1000st time -- this time based on Direct3D compute shader and DirectComposition. Going to integrate the Direct2D/DirectWrite backend of my UI library with those technologies in a bit.

And I'm working on this project because the material removal simulation library that my company has been paying insane amounts of money for in the last 10 years just doesn't "cut" it for our grinding purposes (independently of the cruelfully bad C++ API, and the ~10x worse performance in their debug build because of all those dumb C++ objects inside), and it requires an insane amount of BS architecture around on our part to even use it, so I've concluded (after my own short periods of serious well-meaning suffering with this library on and off during the last 2 years), it should actually be way easier to make our own grinding simulation in a short time after running the numbers and concluding a realistic magnitude.

Guess what, I still see no reason why my little project shouldn't succeed, I've seriously come closer to realizing it, but COM Objects by Microsoft are once again not the reason why that is so. On the contrary, they are annoying to use (I'm not a newcomer to them), even though there are a lot of nice things to say about Microsoft APIs as well (and OpenGL should just die), it seems like a layer of bullshit to wade through for no obvious reason.

The nicest libraries in terms of usability have always been those with a simple & straightforward plain C interface.

PaulDavisThe1st · 2026-06-13T22:50:09 1781391009

We're talking about names. Names like "Rectangle" and "Point" (thanks, Apple).

It's a bit like concrete - there's two kinds (of programmers): the ones who have experienced namespace issues (i.e. name collisions) and the ones who haven't, yet.

jstimpfle · 2026-06-14T06:19:14 1781417954

As most, I have indeed not encountered collisions, yet, not that I could remember anything serious (I remember a macro called sth like WIN32_NO_MINMAX though).

A name like "Rectangle" will only have collisions if there is a C file that simultaneously includes headers from two distinct libraries that define that name, which frankly is very unlikely and very easy to work around. As long as you don't define a silly constructor that creates a symbol Rectangle::Rectangle(float, float, float, float) or whatever, there will not even be linker collisions when distinct parts of a software include distinct definitions of the Rectangle name.

Nevertheless, I simply name my types like "VxRect" or whatever in case they are supposed to be consumed by other components. Much better than Vx::Rect or even Company::Math::Types::Rect, isn't it?

jstimpfle · 2026-06-11T20:03:03 1781208183

This book is one of the few books I own and I couldn't agree more with these reviews. I read maybe 20 to 50 pages of it, and didn't and up with a lot of practically relevant insights. I couldn't say it's a bad book, maybe it's even a good one.

But theory about computer science is always waaaay to removed from practical reality. Only a tiny bit of basic theory is applicable in reality, and from then on, in practice we're just busy fighting with practical problems, we're hardly getting to the theoretic ones.

jstimpfle · 2026-06-11T19:40:20 1781206820

I'd like to note that the only latency that gets increased is latency of the backpressure "signal". Shorter queues can never result in items being processed earlier, I don't think so. It will only increase the time they have to wait until they can "enter the system". Or more of those items will be rejected ("lost event").

jstimpfle · 2026-06-02T10:13:15 1780395195

Rebasing all the time will most likely result in intermediate commits that don't build anymore (because who builds all the intermediate commits after a rebase), and which don't make sense anymore as a "history" of changes.

So my simple resolution has become: don't rebase anything beyond trivial. I use rebase -i to merge, delete, or reorder individual commits on an insolated feature branch. Anything else? Not worth the effort in my opinion. Just merge, in any direction. I even merge between feature branches sometimes, though one must be aware that this essentially glues the two feature branches, meaning either both or none can get merged to master.

Something worth stressing: The most important commit is always the one HEAD points to. Older commits don't matter so much to the degree that cleaning them up later is wasted time. The most value of git commit graph comes as a support structure for git merge. In most cases, nobody gives a shit that all intermediate commits are perfect for some definition of "perfect". Most commits will never be read again.

jstimpfle · 2026-05-31T22:55:45 1780268145

I think it wasn't explained in a very accessible way. If I got the gist right, this essentially brings "per-CPU" synchronization to userland. It's typical in the kernel to have per-cpu data, while per-thread data is rare and typically impractical. There is a high number of threads managed by the kernel, most of which probably belong to a userland process, most of which do not participate in any given synchronisation scheme. Also threads are often too much of an abstraction for parallel programming needs, given that they are hiding for example cache effects. So it's natural to want to use per-cpu data instead of thread_local data in a userland process, I know I've been wishing for that many times.

With rseq, we can allocate in any userland process one instance of a given synchronisation data structure per each CPU. It's important to understand that userland code accessing per-cpu data structures cannot prevent being scheduled away from a CPU and being replaced by another thread (kernel code can block scheduler for short critical sections). Such a replacement thread may subsequently corrupt that same data that was still in the middle of the transaction. But we can make a subset of transactions safe at least: If a transaction gets committed in a single (final) atomic instruction, and we get kernel support for this transaction to be restarted in case there has been a schedule mid-way, this is a guarantee that at the time of commit, the entire transaction hasn't been interrupted by the scheduler. I.e. a kind of "mutual exclusion" guarantee.

Did I get that right?

saagarjha · 2026-06-08T08:22:50 1780906970

You don't have to use it for this. For example, you can use it for your own transactional memory or hazard pointer scheme.