
LMSYS Blog
lmsys.org/
GLM-4.5 Meets SGLang: Reasoning, Coding, and Agentic Abilities
<p>Today, we are excited to introduce our latest flagship models <a href="https://huggingface.co/zai-org/GLM-4.5">GLM-4.5</a> and <a href="https://huggingfac...

SpecForge: Accelerating Speculative Decoding Training for SGLang
<p>Speculative decoding is a powerful technique for accelerating Large Language Model (LLM) inference. In this blog post, we are excited to announce the open...

Deploying Kimi K2 with PD Disaggregation and Large-Scale Expert Parallelism on 128 H200 GPUs
<h2><a id="1️⃣-introduction-deploying-the-most-advanced-open-source-moe-model" class="anchor" href="#1️⃣-introduction-deploying-the-most-advanced-open-source...

Accelerating SGLang with Multiple Token Prediction
<h2><a id="tldr" class="anchor" href="#tldr" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 1...

How to support new VLMs into SGLang: A Case Study with NVILA
<p>The world of LLMs is evolving at a remarkable pace, with Visual Language Models (VLMs) at the forefront of this revolution. These models power application...

Cost Effective Deployment of DeepSeek R1 with Intel® Xeon® 6 CPU on SGLang
<p>The impressive performance of DeepSeek R1 marked a rise of giant Mixture of Experts (MoE) models in Large Language Models (LLM). However, its massive mode...