This thread collects work around compact embedding storage, retrieval quality tradeoffs, and the operational shape of RAG systems at scale.

The first public note is the RAG memory-efficiency post. Future updates can include benchmarks, implementation details, and paper notes.

Links