Artisanal handcrafted Git repositories

bradfitz · 2025-07-17T02:18:10 1752718690

My recent horror from some git work was discovering how git sorts its tree objects.

The docs just say to sort by C locale (byte-order sorting). Easy. Except git was sometimes rejecting my packfiles as being bogus per its fsck code, saying my trees were misordered.

TURNS OUT THERE'S AN UNDOCUMENTED RULE: you need to append an implicit forward slash to directory tree entry names before you sort them.

That forward slash is not encoded in the tree object, nor is the type of the entry. You just put the 20 byte SHA1 hash, which is to either a blob or a hash (or a commit for submodules).

So you can have one directory with directory "testing" and file "testing.md" and it'll sort differently than a directory with two files "testing" and "testing.md".

You can see a repro at https://gist.github.com/bradfitz/4751c58b07b57ff303cbfec3e39...

(So to verify whether a tree object is formatted correctly, you need to have the blobs of all the entries in the tree, at least one level)

xqb64 · 2025-07-17T08:33:11 1752741191

I've had this exact bug happen to me when I implemented my git clone.

The way I found out was that Github kept rejecting my push, because as I later discovered, my git history was invalid precisely due to entries being sorted improperly due to the forward slash requirement. I could have solved this with the real git, but the point was to use my tool exclusively for version control from inception, so I just deleted the .git folder. So, my git history appears to begin near the end of the whole cycle. But I did manage to learn a lot, both about git and about the language I implemented it in.

Elucalidavah · 2025-07-17T07:56:50 1752739010

> directory tree entry names

But... git doesn't really store directories, does it?

kaoD · 2025-07-17T07:58:59 1752739139

I wrote a longer comment saying this (deleted now since I was wrong).

Turns out that Git does somewhat store dirs (in form of trees). See https://git-scm.com/book/en/v2/Git-Internals-Git-Objects (section "Tree Objects").

To understand op's repro look at the last two lines (objects in the tree) in each of their command outputs, not the files shown in the first few lines.

What I think op means is that the `testing` tree pointed in their first example is sorted after `testing.md` even though it's only called `testing` because it's being sorted as `testing/` and `/` is > `.` bytewise.

I'm not at a computer right now but it would be nice to test it with files named `testing.` and `testing0` since they are adjacent bytewise and would show the implicit forward slash more clearly with the tree object sitting between them.

This makes me wonder why Git can't just store an empty tree for empty dirs.

EDIT: did the Gist https://gist.github.com/alvaro-cuesta/bd0234e3e1a66819c7e9e9...

Notice the `git cat-file -p HEAD^{tree}` outputs.

lucasoshiro · 2025-07-17T15:56:59 1752767819

> This makes me wonder why Git can't just store an empty tree for empty dirs.

tl;dr: it can (see my other comment) and the empty tree is hardcoded. But since the index works with file paths and blobs, having no file means that there's no entry in the index

remram · 2025-07-17T08:50:46 1752742246

Yes it does, it just doesn't store empty directories.

lucasoshiro · 2025-07-17T15:52:02 1752767522

It can store empty directories (actually, trees). It can't do normally because the index maps paths to blobs, an empty directory doesn't have a file to map to a blob and then `git add` will have no effect. Given that normally we write commits from the index content, then normally we won't find an empty tree.

You can run `git commit --allow-empty` with an empty index and the root tree will be the empty tree:

   $ git init
   $ git commit --allow-empty -m foo
   $ git rev-parse @^{tree}
   4b825dc642cb6eb9a060e54bf8d69288fbee4904

4b825dc is the empty tree. And a funny thing about it is that it is hardcoded in Git, and you can use it without having this object:

   $ git init
   $ git commit-tree -m foo 4b825dc642cb6eb9a060e54bf8d69288fbee4904
   $ tree .git/objects # you'll see that there's no file for the empty tree

This is a good reading about that weird object: https://matheustavares.dev/posts/empty-tree

juped · 2025-07-17T15:28:52 1752766132

You can perfectly easily put the empty tree object as a tree object's child, this just isn't supported and some parts of Git will break.

lucasoshiro · 2025-07-16T21:27:07 1752701227

Something that I really like in Git is how its data structures are easy to understand and how transparent it is. It's possible to write your own "Git" compatible with existing Git directories only by reading how it works under the hood

shivasaxena · 2025-07-16T22:32:02 1752705122

I agree, but only in theory.

Projects like gitoxide have been in development for years now.

fiddlerwoaroof · 2025-07-16T23:28:27 1752708507

I wrote a nearly complete implementation of git file format parsers in Common Lisp over like a month of evenings and weekends. I’m sure there are hard parts between where I am and a full git implementation but you can get quite a bit of utility out of a relatively small amount of effort.

MrJohz · 2025-07-17T04:12:58 1752725578

It's a case of Pareto. Parsing the git file format is relatively simple, but handling all the weird states a Git repo can be in and doing the correct things to those files in each state is a lot harder. And then adding the network protocol on top of that makes directly reproducing Git quite difficult.

I know JJ used to use Git2 for a lot of network operations like pushing and pulling, but ran into too many issues with SSH handling that they've since switched to directly invoking the Git binary for those operations.

fiddlerwoaroof · 2025-07-17T04:33:40 1752726820

There aren’t that many weird states a git repository can be in: the on-disk format of the repository is too simple for that. The hard part has to do with the various protocols for transferring objects around.

deathanatos · 2025-07-17T06:46:52 1752734812

I think there's more corners out there than most people would give credit to? Just off the top of my head: files in the index (but maybe this isn't "weird enough"), rebasing but paused, rebasing with conflicts, merge with conflicts, cherry-picking but conflicts, middle of a bisect with all the state that implies, alternate objects dirs, alternate working dirs, submodules and all of their weirdness, and a "bare" repo.

Heck, had my PS1 return an error this week after I created a separate working dir for a repo and cd'd into it. Did you know .git can be a normal file? I didn't when I wrote my PS1.

lucasoshiro · 2025-07-17T00:58:04 1752713884

Yeah, I wrote mine in Haskell. It's a good exercise for understanding how Git works

chubot · 2025-07-17T00:56:26 1752713786

Not sure what gitoxide is, but libgit already exists, and it seems to be an independent implementation - https://github.com/libgit2/libgit2

I think Github and most big Git hosts use it

steveklabnik · 2025-07-17T03:22:58 1752722578

libgit2 has a ton of compatibility issues, especially around authentication, that make it only useful in some circumstances.

(gitoxide is a similar project but in Rust, it's not ready for the big time either, though it keeps on getting better!)

3eb7988a1663 · 2025-07-17T03:31:09 1752723069

Jujitsu threw in the towel and is shelling out to the git CLI because of minor variations in libgit vs the binary.

Failing to find a write-up, but there was this lobster thread[0] where someone from GitLab reported they had to do the same owing to some discrepancies vs the binary -where all of the real development happens.

[0] https://lobste.rs/s/vmdggh/jujutsu_v0_30_0_released

Dylan16807 · 2025-07-17T05:17:26 1752729446

But nothing in that description of problems is tied to the repository format.

veganjay · 2025-07-16T23:16:34 1752707794

Neat to see this done by hand! It helps demystify the magic behind git commands.

If you like this, I also recommend "Write Yourself a Git", where you build a minimal git implementation using python: https://wyag.thb.lt/

xqb64 · 2025-07-17T07:36:53 1752737813

There is also James Coglan's "Building git" book that I just went through and can vouch for its quality.

bhasi · 2025-07-17T01:36:53 1752716213

A similar project is CodeCrafters' Build Your Own Git: https://app.codecrafters.io/courses/git/overview

wonderwonder · 2025-07-17T00:18:04 1752711484

How cool, thank you

sc68cal · 2025-07-16T21:11:53 1752700313

To the site author: I'm on a MBP M1 Mac and honestly I can't really read the text. Far too small, and increasing the zoom just makes the text large but the margins less wide. Firefox reader mode also renders really badly.

Please, consider making the layout better for us old coders whose eyes are going, or for hi res displays

retsibsi · 2025-07-17T16:26:13 1752769573

For me, the text size would be fine if the contrast were better. The background colour is similar to the colour of the non-central pixels of the text, and even the central pixels are grey rather than black.

derefr · 2025-07-16T23:17:17 1752707837

FYI: the pinch-to-zoom gesture from mobile browsers (from before websites were mobile-responsive) has also long been implemented for all modern desktop browsers. It's viewport zoom, which is far better than the font-scaling zoom you get by pressing Cmd-+, and makes this site easily readable.

(The much-less-well-known mobile double-tap-on-text gesture [it zooms-to-fit whatever element you tapped on to the width of the viewport] was also ported to desktop browsers. Though, on desktop with a touchpad, it's a two-finger double-tap — which I don't think anyone would ever even think to try.)

LocalPCGuy · 2025-07-17T15:15:17 1752765317

FWIW, most browsers by default now do a viewport zoom with Ctrl/Cmd-+ rather than a font-scaling zoom. I think browsers generally have the option to change that, so if you prefer the former but it's doing the latter, may check the browser settings.

BobaFloutist · 2025-07-17T00:33:15 1752712395

Double tap on text highlights it for me. Is that an Iphone/android thing or what?

derefr · 2025-07-17T00:58:01 1752713881

As I said, it's a two finger double-tap.

But also, under further investigation — and unlike with pinch-to-zoom — desktop support for the two-finger double-tap gesture seems to be specific to macOS. (Which is weird, because Chrome has support for arbitrary multitouch gesture processing to enable the JS multitouch API. So you'd think Chrome's support for "the multitouch gestures the OS expects" would be built on top of that generic multitouch recognizer [and therefore working everywhere that recognizer works], instead of expecting the OS to pre-recognize specific gestures and translate them to specific OS input events.)

BobaFloutist · 2025-07-17T02:15:30 1752718530

I was trying on my phone, but my laptop seems to interpret it as a right click. Which, frankly, makes sense.

antonvs · 2025-07-17T06:44:05 1752734645

On my iPad in Safari and Pixel Android phone in Firefox, one-finger double tap on text does the fit to viewport.

On my Ubuntu laptop in Chrome, I couldn’t find a way to make it work - even tapping the touchscreen didn’t work. But I’m not using the stock Ubuntu GUI, so it could be that (LXqt+XMonad).

sam_lowry_ · 2025-07-16T21:19:23 1752700763

Works great on Firefox for Android though )

lucasoshiro · 2025-07-16T21:34:45 1752701685

Also works great on Safari on a M1 MacBook Air, here

jllyhill · 2025-07-17T10:37:29 1752748649

Am I the only one having troubles with the site on mobile? I'm using Firefox on a decent Android phone but the scroll is extremely stuttery and it distracts from the article unfortunately.

styanax · 2025-07-17T11:32:19 1752751939

The site is built with a content creation tool which has used a lot of JS and CSS, but the CSS is atrocious in it's automated output so it's triggering the browser to have to interpret the mess of directives in every code block. The tool is generating HTML trash like (brackets replaced for comment to not parse):

    [span style="--0:#E1E4E8;--1:#24292E"] [/span]

...over and over, essentially giving style directives for every blank space in the code block. A less capable mobile CPU may well have issues rendering this site due to the presence of so much trash CSS inside it guts. $0.02 hth

lemming · 2025-07-17T06:19:48 1752733188

Git refers to the user-friendly commands as “porcelain”

Ahhhhahahaha… “user friendly”. When compared to coding the repo by hand, I guess.

aGHz · 2025-07-17T11:02:31 1752750151

When compared to the "plumbing" commands. If you want to know more about git's plumbing vs porcelain metaphor, this is a good quick overview: https://stackoverflow.com/a/39848551

antonvs · 2025-07-17T06:49:38 1752734978

This is what happens when you let an OS kernel guy write a cli.

HexDecOctBin · 2025-07-17T01:08:36 1752714516

Okay, there's something I have been thinking about recently. Is it possible to somehow make Git use the Content Defined Chunking algorithm from rsync? Maybe somehow using clean/smudge? If not git, then maybe Mercurial, Fossil or any other DVCS?

This would help with large binary assets without having to deal with the mess that is LFS, as long as the assets were uncompressed.

hanwenn · 2025-07-17T12:41:17 1752756077

IIRC it already uses content defined chunking for finding object deltas.

BobbyTables2 · 2025-07-16T23:47:46 1752709666

I realize the concept is very similar but would love to see a writeup on bow Docker stores images using OverlayFS. (Has quite a bit of metadata!)

kassah · 2025-07-16T21:52:44 1752702764

The simplicity of Git is awesome. Great article! I had looked at what it would take to find a single file in a remote git repo. I decided against talking the git protocol directly and just checking out the entire repo to get a single file. Reading through this makes me think I may have given up too easily.

I asked a few git hosting providers, and they all said they had private APIs developed internally for the purpose.

mitchitized · 2025-07-17T13:10:51 1752757851

I closed the tab as soon as I saw `ignorecase = true`.

Absolutely NOT going there again.

* points at numerous scars and trauma

aeblyve · 2025-07-16T23:20:33 1752708033

I thought this was going to be a sardonic article about doing programming without LLMs.

lioeters · 2025-07-17T00:12:46 1752711166

I'm starting to see this kind of wording as a unique selling point, that some software (or article, visual art, etc.) is handcrafted and artisanal, as opposed to AI-generated. "Every word was written by me, a human being!" At this point in the emerging technology I can usually tell the difference intuitively, but it's possible that one day it will be indistinguishable - and the quality of "handmade" will be simply a matter of branding for niche enthusiasts, like vinyl records.

lan321 · 2025-07-17T11:54:24 1752753264

Homegrown bugs from sustainably raised Bio-certified devs vs industrial bugs.

iJohnDoe · 2025-07-17T05:28:37 1752730117

What is this web site theme or CMS?

deadbabe · 2025-07-16T21:23:56 1752701036

[flagged]

ChrisMarshallNY · 2025-07-16T21:35:51 1752701751

My understanding is that Mercurial is sort of Beta to Git's VHS. There are some definite advantages, but it's losing support.

GuB-42 · 2025-07-16T23:50:44 1752709844

I am sure that it is because the porn industry settled on Git :)

Anyways, I started on Mercurial, and I think it has a better UX, but technically I now prefer Git. The success of Mercurial over Git surprised me a little because of that, Git is not an easy version control system to get into, at least when compared to Mercurial, it shouldn't help adoption, but I guess it is just because some big names decided on Git.

Mercurial and Git use the same fundamental principles, and one is not really better than the other, just details.

zanecodes · 2025-07-16T21:54:55 1752702895

I thought all the cool kids were on Pijul, or was it Darcs? Maybe it was Fossil? No wait, it was definitely Jujutsu.

jact · 2025-07-17T00:32:07 1752712327

Can confirm that cool kids are definitely using Fossil

gerdesj · 2025-07-16T22:20:01 1752704401

This is all very well but how does Linus Thorvalds use git? Given he invented the bloody thing, it might be nice to see how the Boss uses it!

git was created to scratch an itch (actually a bit of a roiling boil, that needed a serious amount of soothing ointment and as it turns out: a compiler, some source code and quite a lot of effort). ... anyway the history of it is well documented.

FFS: git was called git because a Finnish bloke with English as a second, but well used, tongue had learned what a "git" is and it seemed appropriate. Bear in mind that Mr T was deeply in his shouty phase at that point in time.

Artisanal git sounds all kinds of wrong 8) Its just a tool to do a job and I suggest you use it in the same way as the XKCD comic mandates (that is the official manual, despite what you might think)

The Conclusion is spot on - great article.

lysace · 2025-07-16T21:27:09 1752701229

I would have called this: "Futzing around with internal git data structures".

DrBazza · 2025-07-17T09:46:30 1752745590

I'm glad I clicked through to the actual article rather than dismissing it via its slightly silly title. I learnt a few things about git, and I didn't realize that the tool `pigz` existed. Today I learnt...