Glenn's Digital Garden

Home

❯

CacheBlend

CacheBlend

Apr 13, 2026

  • artificial-intelligence/inference
  • seedling

CacheBlend is a technique that allows you to graft together the KV caches of multiple different queries to avoid recomputing the prefill of mostly identical prefixes.1

There are other techniques that followed it:

  • DroidSpeak: Efficient Context Sharing for Multiple-LLM Inference - Microsoft Research
  • ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference - Astrophysics Data System
  • NeurIPS Poster SmartCache: Context-aware Semantic Cache for Efficient Multi-turn LLM Inference
  • [2509.24832] SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching

Footnotes

  1. [2405.16444] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion ↩


Graph View

Created with Quartz v4.5.2 © 2026

  • glennklockwood.com
  • @glennklockwood.com