CacheBlend is a technique that allows you to graft together the KV caches of multiple different queries to avoid recomputing the prefill of mostly identical prefixes.1
There are other techniques that followed it:
- DroidSpeak: Efficient Context Sharing for Multiple-LLM Inference - Microsoft Research
- ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference - Astrophysics Data System
- NeurIPS Poster SmartCache: Context-aware Semantic Cache for Efficient Multi-turn LLM Inference
- [2509.24832] SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching