sjm217's comments

sjm217 · 2026-06-07T14:51:17 1780843877

Though I take your point that it’s not big data by the conventional use (i.e. requiring a distributed computing to process). The phrasing in the original article was better: “To make iterative analysis practical, we wrote a Julia pipeline: NetCDF source files are converted to Apache Arrow, then thread-parallel bit extraction is performed into a DuckDB database.”

sjm217 · 2026-06-06T23:46:14 1780789574

The dataset was 136GB (about 7GB per annum), and the Python implementation took 45 hours for each run. The Julia code that processed the whole dataset and built the database took 5 hours, which made iterative development much more pleasant. Of course, later stages in the pipeline had much less data to process and so were much faster. With metadata and indices, that was about 3GB. It's bigger than your estimate since there are multiple observations of the same satellite.

sjm217 · 2026-06-05T18:25:32 1780683932

The code is all available and every claim is traceable back to the statistical analysis. Results are reproducible from the original data which is archived on Zenodo. Further analysis would be very welcome. https://github.com/sjmurdoch/gps-special-messages

sjm217 · on Nov 3, 2023

Some press coverage from Computer Weekly https://www.computerweekly.com/news/366557952/EU-eIDAS-refor... and The Record https://therecord.media/eu-urged-to-drop-law-website-authent...

sjm217 · on March 23, 2023

I feel this is a bit different. At least O_TRUNC is an option that is shown in documentation right next to the open() function so the programmer has the opportunity to spot it. With the FileSavePicker, there is no such option available and they have to add a line to manually truncate the stream. Also, open() is a low-level call, whereas FileSavePicker is the supposedly easy-to-use high level feature. I would say it is closer to fopen(), which does truncate by default.

eviks · on March 23, 2023

These are very minor points. The main one is the same across platforms : bad defaults are common