Feedle - Ai - Community - lmsys-blog

LMSYS Blog

Articles

9月26日 19:02

Last updated

Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

<h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1...

LMSYS Blog 2025/09/26

framework tool

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

<p>The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress to optimize the inference performance of ...

LMSYS Blog 2025/09/25

library tool

Enabling Deterministic Inference for SGLang

<p>This post highlights our initial efforts to achieve deterministic inference in SGLang. By integrating batch invariant kernels released by Thinking Machine...

LMSYS Blog 2025/09/22

api tool

Optimizing FP4 Mixed-Precision Inference on AMD GPUs

<p>Haohui Mai (CausalFlow.ai), Lei Zhang (AMD)</p> <h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" cl...

LMSYS Blog 2025/09/21

library tool

SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends

<h2><a id="from-the-community" class="anchor" href="#from-the-community" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" ...

LMSYS Blog 2025/09/10

library tool

LongCat-Flash: Deploying Meituan's Agentic Model with SGLang

<h3><a id="1-introduction-deploying-meituans-agentic-open-source-moe-model" class="anchor" href="#1-introduction-deploying-meituans-agentic-open-source-moe-m...

LMSYS Blog 2025/09/01

library tool

All Sources (12)

AI-Shift Tech Blog

Andrej Karpathy's Blog

azukiazusa のテックブログ2

Builder.io Blog

Chip Huyen's Blog

Elvis Saravia's NLP Blog

Lai.so Blog

LMSYS Blog

Simon Willison's Blog

Socket

Zenn mizchi

Zenn schroneko

LMSYS Blog

Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

Enabling Deterministic Inference for SGLang

Optimizing FP4 Mixed-Precision Inference on AMD GPUs

SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends

LongCat-Flash: Deploying Meituan's Agentic Model with SGLang