Hacker Newsnew | past | comments | ask | show | jobs | submit | tamnd's commentslogin

This could be a nice code golf project. It only needs a webview, a ZIM reader, and a way to append data to an existing binary and read it back.

I did something like that a very long time ago (Of course, I have forgotten)


For sharing, better use the html folder or zim format, Kage supports both of them.

I have a project for creating and archiving RSS feeds, keeping the full history from the time the crawler starts. I need to clean up a bit, then will open source it soon.

Exactly. For downloading, Kage requires Chrome or Chromium. Running it inside Docker makes setup easier and keeps cleanup simple:

https://github.com/tamnd/kage/blob/main/Dockerfile

Btw, let me think the way to only enable this when running inside Docker.


Docker is designed to be undetectable by default, the best way I have found is to set env IN_DOCKER=True manually in your Dockerfile + check that there is no $DISPLAY configured + that you're on linux. Usually if all/most of those are true you can safely add --no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage etc. all the docker-specific flags. Thats what we do in https://github.com/ArchiveBox/ArchiveBox/blob/dev/Dockerfile...

Making docs available offline was one of my main motivations for building this tool. I will try Apple Docs too.

I previously downloaded the Snowflake docs, and it was something like tens or even hundreds of thousands of pages, I do not remember exactly. The output ended up being very large.

By the way, I forgot to add zstd compression support to my ZIM reader/writer. I will implement that in the next version.


Kiwix has readers for almost every platform, Android, desktop, iPhone. That's why I made Kage produce ZIM file.

The executable file is mostly for people who don't have Kiwix installed yet, or just want to run the archive directly.


This brings back memories. Around twenty years ago, internet was still expensive dial-up, so I used to go to an internet cafe, run HTTrack to download websites and manga, copy everything onto my tiny 128MB USB stick (felt very large at that time), then bring it home and read offline ;))

You could use python -m http.server instead. I haven't tried it yet, but it should work.

Actually, Kage has two parts: a crawler that crawls pages and converts them to clean HTML by capturing the DOM after rendering in Chrome/Chromium, and a pack/serve component that packages the result as either a ZIM file for Kiwix or an executable file.


I have a bunch of opinionated/personal-use binaries like this in my $HOME/bin/, like delete-all-npm, clean-rust-cache, download-youtube-playlist, and get-markdown <url>. It feels good, and I don't need to remember any commands. Sometimes my coding agent can figure out how to call some of those tools too ;))

Submitting this to Hacker News is the right place! Thanks for your idea. I will consider implementing that :)

Also, in my mind, I already have a script/program to convert HTML to Markdown, so it could actually store everything on disk as a folder of Markdown files, and then commit them to a Git repo.


I think the zim flow was perfect for offline use. I know I will be making use of it as soon as I can figure out how to pass chrome the cookies so I can be signed into the site. Didn't see it in the page, but I didn't look closely yet.

Not yet supporting cookies, since I created this tool for shadowing public websites first. I will add options to pass cookies later. It will pass them to the underlying Chrome/Chromium process, so it should not be hard to do.

Not to load you up with too many ideas, but a markdown folder sounds a lot like obsidian, which has a plugin system now.

Epub would also be a great target.


I would use the shit out of this. I'm a heavy user of Logseq (OG, the md file-based version). Would LOVE to save my favorite web resources this way.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: