Feedle - Ai - Community - lmsys-blog

LMSYS Blog

Articles

10月22日 13:01

Last updated

Accelerating Hybrid Inference in SGLang with KTransformers CPU Kernels

<h2><a id="background-hybrid-inference-for-sparse-moe-models" class="anchor" href="#background-hybrid-inference-for-sparse-moe-models" aria-hidden="true"><sv...

LMSYS Blog 2025/10/22

library tool

NVIDIA and SGLang Accelerating SemiAnalysis InferenceMAX and GB200 Together

<p>The SGLang and NVIDIA teams have a strong track record of collaboration, consistently delivering inference optimizations and system-level improvements to ...

LMSYS Blog 2025/10/14

api framework tool

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

<p>Thanks to NVIDIA’s early access program, we are thrilled to get our hands on the NVIDIA DGX™ Spark. It’s quite an unconventional system, as NVIDIA rarely ...

LMSYS Blog 2025/10/13

tool

SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention

<p>We are excited to announce that <strong>SGLang supports DeepSeek-V3.2 on Day 0</strong>! According to the DeepSeek <a href="https://github.com/deepseek-ai...

LMSYS Blog 2025/09/29

library tool

PD-Multiplexing: Unlocking High-Goodput LLM Serving with GreenContext

<p>This post highlights our initial efforts to support <strong>a new serving paradigm, PD-Multiplexing, in</strong> <strong>SGLang.</strong> It is designed t...

LMSYS Blog 2025/09/28

library tool

Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

<h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1...

LMSYS Blog 2025/09/26

framework tool

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

<p>The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress to optimize the inference performance of ...

LMSYS Blog 2025/09/25

library tool

Enabling Deterministic Inference for SGLang

<p>This post highlights our initial efforts to achieve deterministic inference in SGLang. By integrating batch invariant kernels released by Thinking Machine...

LMSYS Blog 2025/09/22

api tool

Optimizing FP4 Mixed-Precision Inference on AMD GPUs

<p>Haohui Mai (CausalFlow.ai), Lei Zhang (AMD)</p> <h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" cl...

LMSYS Blog 2025/09/21

library tool

SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends

<h2><a id="from-the-community" class="anchor" href="#from-the-community" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" ...

LMSYS Blog 2025/09/10

library tool

LongCat-Flash: Deploying Meituan's Agentic Model with SGLang

<h3><a id="1-introduction-deploying-meituans-agentic-open-source-moe-model" class="anchor" href="#1-introduction-deploying-meituans-agentic-open-source-moe-m...

LMSYS Blog 2025/09/01

library tool

Finetune and deploy GPT-OSS in MXFP4: ModelOpt+SGLang

<p>GPT-OSS, the first open-source model family from OpenAI's lab since GPT-2, demonstrates strong math, coding, and general capabilities even when compared w...

LMSYS Blog 2025/08/28

library tool

SGLang for gpt-oss: From Day 0 Support to Enhanced Performance

<p>We are excited to announce a major update for SGLang, focusing on deep performance optimizations and new features for the recently released openai/gpt-oss...

LMSYS Blog 2025/08/27

library tool

GLM-4.5 Meets SGLang: Reasoning, Coding, and Agentic Abilities

<p>Today, we are excited to introduce our latest flagship models <a href="https://huggingface.co/zai-org/GLM-4.5">GLM-4.5</a> and <a href="https://huggingfac...

LMSYS Blog 2025/07/31

library tool

SpecForge: Accelerating Speculative Decoding Training for SGLang

<p>Speculative decoding is a powerful technique for accelerating Large Language Model (LLM) inference. In this blog post, we are excited to announce the open...

LMSYS Blog 2025/07/25

framework tool

Deploying Kimi K2 with PD Disaggregation and Large-Scale Expert Parallelism on 128 H200 GPUs

<h2><a id="1️⃣-introduction-deploying-the-most-advanced-open-source-moe-model" class="anchor" href="#1️⃣-introduction-deploying-the-most-advanced-open-source...

LMSYS Blog 2025/07/20

framework tool

Accelerating SGLang with Multiple Token Prediction

<h2><a id="tldr" class="anchor" href="#tldr" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 1...

LMSYS Blog 2025/07/17

library tool

How to support new VLMs into SGLang: A Case Study with NVILA

<p>The world of LLMs is evolving at a remarkable pace, with Visual Language Models (VLMs) at the forefront of this revolution. These models power application...

LMSYS Blog 2025/07/16

api cloud tool

Cost Effective Deployment of DeepSeek R1 with Intel® Xeon® 6 CPU on SGLang

<p>The impressive performance of DeepSeek R1 marked a rise of giant Mixture of Experts (MoE) models in Large Language Models (LLM). However, its massive mode...

LMSYS Blog 2025/07/14

library tool

slime: An SGLang-Native Post-Training Framework for RL Scaling

<h2><a id="vision-that-drives-slime" class="anchor" href="#vision-that-drives-slime" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" ...

LMSYS Blog 2025/07/09

framework tool

OME: Revolutionizing LLM Infrastructure with Model-Driven Architecture

<h2><a id="the-tale-of-two-teams-why-model-serving-is-broken" class="anchor" href="#the-tale-of-two-teams-why-model-serving-is-broken" aria-hidden="true"><sv...

LMSYS Blog 2025/07/08

cloud platform tool

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput

<p>The GB200 NVL72 is the world's most advanced hardware for AI training and inference. In this blog post, we're excited to share early results from running ...

LMSYS Blog 2025/06/16

library tool

Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs

<p>DeepSeek is a popular open-source large language model (LLM) praised for its strong performance. However, its large size and unique architecture, which us...

LMSYS Blog 2025/05/05

All Sources (12)

AI-Shift Tech Blog

Andrej Karpathy's Blog

azukiazusa のテックブログ2

Builder.io Blog

Chip Huyen's Blog

Elvis Saravia's NLP Blog

Lai.so Blog

LMSYS Blog

Simon Willison's Blog

Socket

Zenn mizchi

Zenn schroneko

LMSYS Blog

Accelerating Hybrid Inference in SGLang with KTransformers CPU Kernels

NVIDIA and SGLang Accelerating SemiAnalysis InferenceMAX and GB200 Together

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention

PD-Multiplexing: Unlocking High-Goodput LLM Serving with GreenContext

Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

Enabling Deterministic Inference for SGLang

Optimizing FP4 Mixed-Precision Inference on AMD GPUs

SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends

LongCat-Flash: Deploying Meituan's Agentic Model with SGLang

Finetune and deploy GPT-OSS in MXFP4: ModelOpt+SGLang

SGLang for gpt-oss: From Day 0 Support to Enhanced Performance

GLM-4.5 Meets SGLang: Reasoning, Coding, and Agentic Abilities

SpecForge: Accelerating Speculative Decoding Training for SGLang

Deploying Kimi K2 with PD Disaggregation and Large-Scale Expert Parallelism on 128 H200 GPUs

Accelerating SGLang with Multiple Token Prediction

How to support new VLMs into SGLang: A Case Study with NVILA

Cost Effective Deployment of DeepSeek R1 with Intel® Xeon® 6 CPU on SGLang

slime: An SGLang-Native Post-Training Framework for RL Scaling

OME: Revolutionizing LLM Infrastructure with Model-Driven Architecture

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput

Deploying DeepSeek with PD Disaggregation and Large-Scale Expert Parallelism on 96 H100 GPUs