> Thy are binary vectors with 768 dimensions, which takes up 96 bytes (768 / 8 = 96).
I guess I’m confused. This is honestly the problem that most vector storage faces (“curse of dimensionality”) let alone the indexing.
I assume that you meant 768 dimensions * 8 bytes (for a f64) which is 6144 bytes. Usually, these get shrunk with some (hopefully minor) loss, so like a f32 or f16 (or smaller!).
If you can post how you fit 768 dimensions in 96 bytes, even with compression or trie-equivalent amortization, or whatever… I’d love to hear more about that for another post.
Ninja edit: Unless you’re treating each dimension as one-bit? But then I still have questions around retrieval quality
Author here - ya "binary vectors" means quantizing to one bit per dimension. Normally it would be 4 * dimensions bytes of space per vector (where 4=sizeof(float)). Some embedding models, like nomic v1.5[0] and mixedbread's new model[1] are specifically trained to retain quality after binary quantization. Not all models do tho, so results may vary. I think in general for really large vectors, like OpenAI's large embeddings model with 3072 dimensions, it kindof works, even if they didn't specifically train for it.
Thank you! As you keep posting your progress, and I hope you do, adding these references would probably help warding off crusty fuddy-duddys like me (or at least give them more to research either way) ;)
You can try out binary vectors, in comparison to quantize every pair of vectors to one of four values, and a lot more, by using a FAISS index on your data, and using Product Quantization (like PQ768x1 for binary features in this case) https://github.com/facebookresearch/faiss/wiki/The-index-fac...
It depends on your data and your embedding model. For example, I was able to quantize embeddings of English Wikipedia from 384-dimensions down to 48 7-bit dimensions, and the search works great: https://www.leebutterman.com/2023/06/01/offline-realtime-emb...
BTW, the "curse of dimensionality" technically refers to the relative sparsity of high-dimensional space and the need for geometrically increasing data to fill it. It has nothing to do with storage. And typically in vector databases the data are compressed/projected into a lower dimensionality space before storage, which actually improves the situation.
I guess I’m confused. This is honestly the problem that most vector storage faces (“curse of dimensionality”) let alone the indexing.
I assume that you meant 768 dimensions * 8 bytes (for a f64) which is 6144 bytes. Usually, these get shrunk with some (hopefully minor) loss, so like a f32 or f16 (or smaller!).
If you can post how you fit 768 dimensions in 96 bytes, even with compression or trie-equivalent amortization, or whatever… I’d love to hear more about that for another post.
Ninja edit: Unless you’re treating each dimension as one-bit? But then I still have questions around retrieval quality