SGLang is an inferencing framework that is the successor to vLLM. It shares the same founding DNA with vLLM and is a collaboration between Berkeley, Stanford, UCSD, CMU, and MBZUAI.
SGLang implements RadixAttention for KV cache offload
Users
- Microsoft AI used SGLang to fine-tune MAI-Thinking-1. See MAI-1 > Fine-tuning MAI-Thinking-1.