Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DeepSeek’s official API has a cache hit rate of over 99% if you use it continuously within the same codebase for long sessions, so it’s much cheaper than frontier models. I have an example of 200M token session in claude code.


Might be a dumb question but do you have to read the files in the same order in new sessions to ensure the correct prefix for the cache?


Also curious. With tool calls reading/searching different files, possible compacting reading a large codebase / long threads, I can't imagine how you hit 99% cache rate.


Yes, you have to use the same session, I guess you could load up a bunch of context, then fork the session into a few different tasks, although I haven't tried it.


Sorry, I was wrong here. I meant a single long session. And there’s no compression, the 1M context is only half used.


Then where did 200M come from? 200,000 tokens?


Not all read tokens are included in the context, many of the tokens are from read cache hits. I hit it many times so it grew to 200M. The number came from the API platform.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: