I’ve always wondered about linear scan vs SSA based spilling & regalloc, like libfirm & qbe use:
‘Register spilling and live-range splitting for SSA-form programs’ and ‘Preference-Guided Register Assignment’
It’s much cleaner to code, since you don’t need to generate live ranges explicitly, and can break it up in two passes. Linear scan creates new inactive /active intervals, while the ssa form allocators just reuse the ssa graph. I’ve never benchmarked them back to back though.
It's quite impressive they're able to take nearly arbitrary C and do this! Very similar to what pypy is doing here, but for C, and not a python subset.
However not without downsides. It sounds like average code is only 2x faster than Lua, vs. LuaJit which is often 5-10x faster.
Tail calling (with musttail)+ preserve_none are definitely the future of interpreters. Gets you ~95% of the performance of writing the vm in assembly, while keeping it high level. Unfortunately only clang and llvm support it so far, but hopefully we get some other llvm backend langs in soon!
Marc feeley wrote “using closures for code generation” way back in 1987. Everything old is new again!
It’s quite nice and only a small bit more code than an ast interpreter - but not as fast as a luajit- style tail calling bytecode vm, nor a baseline jit like copy and patch. It works wonders in languages that support closures though.
Can you go in to more detail on 'wacky register allocation tricks' or instruction selection needed to support nun-tagging? Or pointers to code somewhere? Would be nice to compare some of them to the paper.
Deegen is my research meta-compiler to make high-performance VMs easier to write. Deegen takes in a semantic description of the VM bytecodes in C++, and use it as the single source of truth to automatically generate a high-performance VM at build time
> and that relies on having sufficient test and fuzz coverage
At the faang I worked at, some small portion of servers ran the sanitizers in prod, so you’re not reliant on test coverage nearly so much for catching rare issues.
‘Register spilling and live-range splitting for SSA-form programs’ and ‘Preference-Guided Register Assignment’
It’s much cleaner to code, since you don’t need to generate live ranges explicitly, and can break it up in two passes. Linear scan creates new inactive /active intervals, while the ssa form allocators just reuse the ssa graph. I’ve never benchmarked them back to back though.