My recent horror from some git work was discovering how git sorts its tree objects.
The docs just say to sort by C locale (byte-order sorting). Easy. Except git was sometimes rejecting my packfiles as being bogus per its fsck code, saying my trees were misordered.
TURNS OUT THERE'S AN UNDOCUMENTED RULE: you need to append an implicit forward slash to directory tree entry names before you sort them.
That forward slash is not encoded in the tree object, nor is the type of the entry. You just put the 20 byte SHA1 hash, which is to either a blob or a hash (or a commit for submodules).
So you can have one directory with directory "testing" and file "testing.md" and it'll sort differently than a directory with two files "testing" and "testing.md".
I've had this exact bug happen to me when I implemented my git clone.
The way I found out was that Github kept rejecting my push, because as I later discovered, my git history was invalid precisely due to entries being sorted improperly due to the forward slash requirement. I could have solved this with the real git, but the point was to use my tool exclusively for version control from inception, so I just deleted the .git folder. So, my git history appears to begin near the end of the whole cycle. But I did manage to learn a lot, both about git and about the language I implemented it in.
To understand op's repro look at the last two lines (objects in the tree) in each of their command outputs, not the files shown in the first few lines.
What I think op means is that the `testing` tree pointed in their first example is sorted after `testing.md` even though it's only called `testing` because it's being sorted as `testing/` and `/` is > `.` bytewise.
I'm not at a computer right now but it would be nice to test it with files named `testing.` and `testing0` since they are adjacent bytewise and would show the implicit forward slash more clearly with the tree object sitting between them.
This makes me wonder why Git can't just store an empty tree for empty dirs.
> This makes me wonder why Git can't just store an empty tree for empty dirs.
tl;dr: it can (see my other comment) and the empty tree is hardcoded. But since the index works with file paths and blobs, having no file means that there's no entry in the index
It can store empty directories (actually, trees). It can't do normally because the index maps paths to blobs, an empty directory doesn't have a file to map to a blob and then `git add` will have no effect. Given that normally we write commits from the index content, then normally we won't find an empty tree.
You can run `git commit --allow-empty` with an empty index and the root tree will be the empty tree:
Something that I really like in Git is how its data structures are easy to understand and how transparent it is. It's possible to write your own "Git" compatible with existing Git directories only by reading how it works under the hood
I wrote a nearly complete implementation of git file format parsers in Common Lisp over like a month of evenings and weekends. I’m sure there are hard parts between where I am and a full git implementation but you can get quite a bit of utility out of a relatively small amount of effort.
It's a case of Pareto. Parsing the git file format is relatively simple, but handling all the weird states a Git repo can be in and doing the correct things to those files in each state is a lot harder. And then adding the network protocol on top of that makes directly reproducing Git quite difficult.
I know JJ used to use Git2 for a lot of network operations like pushing and pulling, but ran into too many issues with SSH handling that they've since switched to directly invoking the Git binary for those operations.
There aren’t that many weird states a git repository can be in: the on-disk format of the repository is too simple for that. The hard part has to do with the various protocols for transferring objects around.
I think there's more corners out there than most people would give credit to? Just off the top of my head: files in the index (but maybe this isn't "weird enough"), rebasing but paused, rebasing with conflicts, merge with conflicts, cherry-picking but conflicts, middle of a bisect with all the state that implies, alternate objects dirs, alternate working dirs, submodules and all of their weirdness, and a "bare" repo.
Heck, had my PS1 return an error this week after I created a separate working dir for a repo and cd'd into it. Did you know .git can be a normal file? I didn't when I wrote my PS1.
Jujitsu threw in the towel and is shelling out to the git CLI because of minor variations in libgit vs the binary.
Failing to find a write-up, but there was this lobster thread[0] where someone from GitLab reported they had to do the same owing to some discrepancies vs the binary -where all of the real development happens.
To the site author: I'm on a MBP M1 Mac and honestly I can't really read the text. Far too small, and increasing the zoom just makes the text large but the margins less wide. Firefox reader mode also renders really badly.
Please, consider making the layout better for us old coders whose eyes are going, or for hi res displays
For me, the text size would be fine if the contrast were better. The background colour is similar to the colour of the non-central pixels of the text, and even the central pixels are grey rather than black.
FYI: the pinch-to-zoom gesture from mobile browsers (from before websites were mobile-responsive) has also long been implemented for all modern desktop browsers. It's viewport zoom, which is far better than the font-scaling zoom you get by pressing Cmd-+, and makes this site easily readable.
(The much-less-well-known mobile double-tap-on-text gesture [it zooms-to-fit whatever element you tapped on to the width of the viewport] was also ported to desktop browsers. Though, on desktop with a touchpad, it's a two-finger double-tap — which I don't think anyone would ever even think to try.)
FWIW, most browsers by default now do a viewport zoom with Ctrl/Cmd-+ rather than a font-scaling zoom. I think browsers generally have the option to change that, so if you prefer the former but it's doing the latter, may check the browser settings.
But also, under further investigation — and unlike with pinch-to-zoom — desktop support for the two-finger double-tap gesture seems to be specific to macOS. (Which is weird, because Chrome has support for arbitrary multitouch gesture processing to enable the JS multitouch API. So you'd think Chrome's support for "the multitouch gestures the OS expects" would be built on top of that generic multitouch recognizer [and therefore working everywhere that recognizer works], instead of expecting the OS to pre-recognize specific gestures and translate them to specific OS input events.)
On my iPad in Safari and Pixel Android phone in Firefox, one-finger double tap on text does the fit to viewport.
On my Ubuntu laptop in Chrome, I couldn’t find a way to make it work - even tapping the touchscreen didn’t work. But I’m not using the stock Ubuntu GUI, so it could be that (LXqt+XMonad).
Am I the only one having troubles with the site on mobile? I'm using Firefox on a decent Android phone but the scroll is extremely stuttery and it distracts from the article unfortunately.
The site is built with a content creation tool which has used a lot of JS and CSS, but the CSS is atrocious in it's automated output so it's triggering the browser to have to interpret the mess of directives in every code block. The tool is generating HTML trash like (brackets replaced for comment to not parse):
[span style="--0:#E1E4E8;--1:#24292E"] [/span]
...over and over, essentially giving style directives for every blank space in the code block. A less capable mobile CPU may well have issues rendering this site due to the presence of so much trash CSS inside it guts. $0.02 hth
When compared to the "plumbing" commands. If you want to know more about git's plumbing vs porcelain metaphor, this is a good quick overview: https://stackoverflow.com/a/39848551
Okay, there's something I have been thinking about recently. Is it possible to somehow make Git use the Content Defined Chunking algorithm from rsync? Maybe somehow using clean/smudge? If not git, then maybe Mercurial, Fossil or any other DVCS?
This would help with large binary assets without having to deal with the mess that is LFS, as long as the assets were uncompressed.
The simplicity of Git is awesome. Great article! I had looked at what it would take to find a single file in a remote git repo. I decided against talking the git protocol directly and just checking out the entire repo to get a single file. Reading through this makes me think I may have given up too easily.
I asked a few git hosting providers, and they all said they had private APIs developed internally for the purpose.
I'm starting to see this kind of wording as a unique selling point, that some software (or article, visual art, etc.) is handcrafted and artisanal, as opposed to AI-generated. "Every word was written by me, a human being!" At this point in the emerging technology I can usually tell the difference intuitively, but it's possible that one day it will be indistinguishable - and the quality of "handmade" will be simply a matter of branding for niche enthusiasts, like vinyl records.
I am sure that it is because the porn industry settled on Git :)
Anyways, I started on Mercurial, and I think it has a better UX, but technically I now prefer Git. The success of Mercurial over Git surprised me a little because of that, Git is not an easy version control system to get into, at least when compared to Mercurial, it shouldn't help adoption, but I guess it is just because some big names decided on Git.
Mercurial and Git use the same fundamental principles, and one is not really better than the other, just details.
This is all very well but how does Linus Thorvalds use git? Given he invented the bloody thing, it might be nice to see how the Boss uses it!
git was created to scratch an itch (actually a bit of a roiling boil, that needed a serious amount of soothing ointment and as it turns out: a compiler, some source code and quite a lot of effort). ... anyway the history of it is well documented.
FFS: git was called git because a Finnish bloke with English as a second, but well used, tongue had learned what a "git" is and it seemed appropriate. Bear in mind that Mr T was deeply in his shouty phase at that point in time.
Artisanal git sounds all kinds of wrong 8) Its just a tool to do a job and I suggest you use it in the same way as the XKCD comic mandates (that is the official manual, despite what you might think)
I'm glad I clicked through to the actual article rather than dismissing it via its slightly silly title. I learnt a few things about git, and I didn't realize that the tool `pigz` existed. Today I learnt...
The docs just say to sort by C locale (byte-order sorting). Easy. Except git was sometimes rejecting my packfiles as being bogus per its fsck code, saying my trees were misordered.
TURNS OUT THERE'S AN UNDOCUMENTED RULE: you need to append an implicit forward slash to directory tree entry names before you sort them.
That forward slash is not encoded in the tree object, nor is the type of the entry. You just put the 20 byte SHA1 hash, which is to either a blob or a hash (or a commit for submodules).
So you can have one directory with directory "testing" and file "testing.md" and it'll sort differently than a directory with two files "testing" and "testing.md".
You can see a repro at https://gist.github.com/bradfitz/4751c58b07b57ff303cbfec3e39...
(So to verify whether a tree object is formatted correctly, you need to have the blobs of all the entries in the tree, at least one level)
reply