LMSYS Blog

LMSYS Blog

lmsys.org/
18
Articles
9月26日 19:02
Last updated
Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

<h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1...

LMSYS Blog
framework tool
Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

<p>The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress to optimize the inference performance of ...

LMSYS Blog
library tool
Enabling Deterministic Inference for SGLang

Enabling Deterministic Inference for SGLang

<p>This post highlights our initial efforts to achieve deterministic inference in SGLang. By integrating batch invariant kernels released by Thinking Machine...

LMSYS Blog
api tool
Optimizing FP4 Mixed-Precision Inference on AMD GPUs

Optimizing FP4 Mixed-Precision Inference on AMD GPUs

<p>Haohui Mai (CausalFlow.ai), Lei Zhang (AMD)</p> <h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" cl...

LMSYS Blog
library tool
SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends

SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends

<h2><a id="from-the-community" class="anchor" href="#from-the-community" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" ...

LMSYS Blog
library tool
LongCat-Flash: Deploying Meituan's Agentic Model with SGLang

LongCat-Flash: Deploying Meituan's Agentic Model with SGLang

<h3><a id="1-introduction-deploying-meituans-agentic-open-source-moe-model" class="anchor" href="#1-introduction-deploying-meituans-agentic-open-source-moe-m...

LMSYS Blog
library tool