Hacker Newsnew | past | comments | ask | show | jobs | submit | mabster's commentslogin

Low level CPU-related optimisation is absolutely still a thing. The GPU is always filled to the brim trying to get as much quality out of a graphics frame so a lot gets offloaded to the CPU. When I was doing this I was doing a lot of low-level CPU optimisation. GPU optimisation was usually more about transform process topology but there was plenty of low-level work to do there too.

Games are both high throughput AND low-latency and C++ is still king there


C++ is no doubt king in games (for reasons that aren't necessarily primarily performance [1]), but not only are there plenty of high-throughput low-latency applications in C++, I believe there are more than in C++.

BTW, "low latency" is relative, and in most games the relevant latency is the frame, which is usually between 5-15 ms. I worked at a place that did large low-latency software, some soft realtime and some safety-critical hard realtime, where the cutoff between Java and low-level was whether the required latency was under 10us (tha's microseconds!). That's an order of magnitude below what's in games. We did use specialised versions of Java (and specialised kernels), but these days, on normal OSes and plain Java, the cutoff is usually around 1-3ms (although at that point you often need special kernels anyway).

Something that C++ people often don't know is that there's nothing in Java that makes it any harder to compile and run with optimisations at least as good as those offered by C++, but the opposite isn't the case: there are fundamental problems that make it hard to perform some optimisations in C++. Of course, the tradeoff is predictability. Some aggressive optimisations require speculation, which means a fallback to deoptimised (even interpreted) code and then recompilation. I pure compilation and memory management terms, Java has the advantage, but it aims to make the average-case faster than C++ at the expense of the worst case.

[1]: E.g. AAA games are extremely conservative when it comes to technology choices; more conservative than even the military. AAA games often need to target limited consoles where there are few alternatives to C++ available.


I'm a Java developer now, amongst other languages. The advantage of Java is that it takes A LOT less time to develop something, so there is the whole bang for buck for sure. I have had a few problems where I would love shared direct memory access and some atomics (because it would be a lot easier). But for the most part developing in Java is a lot quicker.

I don't think game developers are more conservative than any other developers. We do have large C++ codebases and so it's hard to change.

All modern engines have a few scripting languages tacked on too.

Something like Lua usually is the sweet spot: most of the people developing scripts are not developers. We even had a Java interpreter for scripting once, but it lost favor for this reason.

There were exceptions, but I found that developers generally preferred C# over Java anyway. Our assets pipelines are generally in C# already.

Any speculative optimisation we were doing by hand. There is the whole deferring allocations / moving allocations, both of which we were already doing (e.g. copying every frame).

A lot of our C++ code is intrinsics (including memory primitives like _mm_stream_ps and barriers) and you HAVE to have good control over how memory is laid out (e.g. knowing that data is split between cache lines so that you you don't get contention). Lots of spin locks too. I just don't see how you can do this kind of low level work in Java.


> A lot of our C++ code is intrinsics (including memory primitives like _mm_stream_ps and barriers)

Java has such intrinsicts, too: https://docs.oracle.com/en/java/javase/25/docs/api/java.base.... They may not look like intrinsics that compile to a single machine instruction, but the are (I don't think we offer stream access, simply because there hasn't been demand for it; if there is, we can add it. I actually added a streaming array copy to the JVM because I thought I could use it for something, but the results weren't what I expected, so I took it out)

BTW, here's a list of our intrinsics:

https://github.com/openjdk/jdk/blob/master/src/hotspot/share...

As you might notice, they include SIMD intrinsics offered through https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incub...

> and you HAVE to have good control over how memory is laid out (e.g. knowing that data is split between cache lines so that you don't get contention)

We have the `@Contended` annotation precisely for that: https://github.com/openjdk/jdk/blob/master/src/java.base/sha... You have to use a flag to tell the JVM to respect this annotation, but the people who write high performance code know this: https://www.baeldung.com/java-false-sharing-contended

> Lots of spin locks too.

We have an intrinsic for spin locks: Thread.onSpinWait() https://docs.oracle.com/en/java/javase/25/docs/api/java.base...()

> I just don't see how you can do this kind of low level work in Java.

There's no reason you should if you're not writing high performance code in Java, but the people who write such code in Java know how to do these things in Java.

To be clear, Java certainly doesn't offer as much precise control as a low-level language, but it does offer everything you need for high performance (except array-of-struct, but that will arrive soon). The reason for that is that there's high demand for these constructs because so much of the worlds performance-sensitive software is written in Java. Traditionally, not games (which often have to run on platforms for which we don't offer Java) but manufacturing automation, defence, and trading.

> There is the whole deferring allocations / moving allocations, both of which we were already doing (e.g. copying every frame).

Yes, you can certainly do some memory management optimisations in C++, although with some effort (it's especially hard to use some standard library stuff, but when I write high performance code in C++ I don't use std at all). The low-level language that makes it easier is Zig.

> Any speculative optimisation we were doing by hand.

It's hard to do speculative optimisation by hand, unless you're generating code on the fly. The way speculative optimisations work is that we observe that something has been true so far (e.g. think about a specific branch that's always taken or a dynamic dispatch that only hits a certain target at a certain callsite) but the compiler can't prove that it's necessarily true. So we emit machine code that assumes it's true with special traps that would trigger some fault signal if the assumption is invalidated. If the trap is hit, we capture the signal, deoptimise the subroutine and then recompile it differently (without the assumption).

In C++ what I do is do some of the same optimisation results by hand (typically using templates), but of course, they're not speculative and I need to be careful. There's also code size and I-cache implications, but while we try to keep an eye on the I-cache, Java doesn't always get this balance right, either.


Ok fair enough. I don't write performance stuff in Java so I haven't even needed to look at this stuff to be honest. Most of the intrinsics I would want are there, except for any memory related stuff. I'm still not sure how structs are laid out in memory but I guess there's something for that too. My favourite thing in C++ is just loading a big binary blob and being able to point directly into it.

The only thing that was different was that we had a number of platform-specific intrinsics to really shake fast code out. E.g. shuffles on x86 on older SSE editions were terrible and we would have custom x86 code for shuffles or let memory out differently.

The only thing we use from C++ stdlib is unique_ptr. For everything else we had our own much more tailored, much faster, stuff. We had something like 10 different array containers for example.

Yeah what you described with templates is what we are doing re speculative optimisation. We have tuned versions for different workloads. We would inspect before we decide which one to run (only if that wasn't slower then just having one implementation, which was often the case because of instruction cache).

Something to be aware of is that on consoles mmapping a page to be executable was forbidden. So no JIT. And you aim for your slowest target so PC just follows that.


> My favourite thing in C++ is just loading a big binary blob and being able to point directly into it.

That's what the Foreign Function & Memory API (FFM) is for: https://docs.oracle.com/en/java/javase/25/docs/api/java.base... (before FFM, this was done through something called Unsafe, which is now in the process of being removed).

> Something to be aware of is that on consoles mmapping a page to be executable was forbidden. So no JIT. And you aim for your slowest target so PC just follows that.

Certainly. Games have very good reasons to prefer C++ over Java. But these reasons have much more to do with platform support and other hardware constraints than sheer performance.


I went to the page expecting to rant about how it's not actually credit card size because of the thickness and was for once pleasantly surprised! Kudos to the author! It looks great!

I was thinking all that too and considering commenting about being sick of those credit card size claims, but after seeing the footage I am genuinely impressed. Great work there.

Love how the point of this entire thing comes across!

I'll be that guy I guess then... they stated on their page that credit cards are 0.8mm while the muxcard is 1mm and yet they still claim it is "literally the size of a credit card"... not to mention that they carved out an NFC card, not a credit card.

Yes it's still impressive either way, I'm not debating that.


It said that the specification states 0.8mm, but that many real world cards are thicker. Are credit cards actually 0.8mm?

Did we read the same text? He wrote 0.76mm.

I did notice that difference too. But previous "credit card size" projects have all been several mm (as in couldn't fit a wallet designed for credit cards). So 1 mm is... pretty sweet!

That's pretty close to "be like Keanu Reeves"!


It's always the kitchen for me across food places (in Australia). Ending up with pickles when I removed them. Ending up with coke zero instead of coke. But the worst is ending up with anything mock meat!


McDonalds once forgot to actually put the patty on my burger. No idea how they managed to do that.


Do you think that kitchen was using a robot to build the burgers?


It's been mixed moving to normal code: I haven't had to low-level optimise for ages now (man I miss that). But performance in the O() sense has been the same.

Game engine development is very much about processing of data. The pipeline is long and the tree is wide. Being able to reason about complicated data processing topologies mapped very easily across.


I haven't watched his videos on his language for ages, but this was a big thing he wanted in his language: being able to swap between array-of-structs and struct-of-array quickly and easily.


Unless LA stood for Latin America haha.


I guessed American when it was compared to Hockey, Baseball and Basketball.

In Melbourne, Australia, Football is again another sport (but it not being called Footy gives it a way).


I flew Scoot airlines recently and my 13” MacBook Air was too big to have on my lap even though the seat in front was not reclined.

There's also something about those seats where you get back pain when you try to sleep with your own seat reclined.


Comfort is an up charge.


I'm guessing you're talking about interlacing?

I've never really experienced it because I've always watched PAL which doesn't have that.

But I would have thought it would be perceived as flashing at 60 Hz with a darker image?


PAL had interlacing


For anyone this deep on the thread, check out this video (great presenter!) explaining TV spectrum allocation, NTSC, PAL, and the origin of 29.97 fps.

https://youtu.be/3GJUM6pCpew


TIL NTSC: He explained that NTSC stands for Not The Smartest Choice, but I always assumed it meant Never The Same Color.


Memories shattered. Yeah, you're right and I would have watched interlaced broadcast content.

I saw interlaced NTSC video in the digital days where the combing was much more obvious and always assumed it was only an NTSC thing!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: