Last updated: 2026/03/17 07:01
Big new release from Mistral today (despite the name) - a new Apache 2 licensed 119B parameter (Mixture-of-Experts, 6B active) model which they describe like this: Mistral Small 4 is …
Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag. They're very similar to the Claude Code implementation, with default subagents …
The point of the blackmail exercise was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice …

Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed at data journalists demonstrating ways that tools like Claude …
How coding agents work - Agentic Engineering Patterns
What is agentic engineering? - Agentic Engineering Patterns

The Top AI Papers of the Week (March 9 - March 15)
GitHub’s slopocalypse – the flood of AI-generated spam PRs and issues – has made Jazzband’s model of open membership and shared push access untenable. Jazzband was designed for a world …

I was a speaker last month at the Pragmatic Summit in San Francisco, where I participated in a fireside chat session about agentic engineering hosted by Eric Lui from Statsig. …

Claude Code Review, AutoHarness, Perplexity Personal Computer, Cloudflare /crawl, Context7 CLI, and More
Here's what surprised me: Standard pricing now applies across the full 1M window for both models, with no long-context premium. OpenAI and Gemini both charge more for prompts where the …
Simply put: It’s a big mess, and no off-the-shelf accounting software does what I need. So after years of pain, I finally sat down last week and started to build …
PR from Shopify CEO Tobias Lütke against Liquid, Shopify's open source Ruby template engine that was somewhat inspired by Django when Tobi first created it back in 2006. Tobi found …
Brutal satire on the whole vibe-porting license washing thing (previously): Finally, liberation from open source license obligations. Our proprietary AI robots independently recreate any open source project from scratch. The …
Epic piece on AI-assisted development by Clive Thompson for the New York Times Magazine, who spoke to more than 70 software developers from companies like Google, Amazon, Microsoft, Apple, plus …
Here's what I think is happening: AI-assisted coding is exposing a divide among developers that was always there but maybe less visible. Before AI, both camps were doing the same …

Today in animated explanations built using Claude: I've always been a fan of animated demonstrations of sorting algorithms so I decided to spin some up on my phone using Claude …
<p>We are excited to announce that SGLang supports NVIDIA Nemotron 3 Super on Day 0.</p> <p><a href="https://developer.nvidia.com/blog/introducing-nemotron-3...
AI should help us produce better code - Agentic Engineering Patterns
AI assistants are getting better at helping people inside the browser, but they still need too much babysitting.
A recurring concern I’ve seen regarding LLMs for programming is that they will push our technology choices towards the tools that are best represented in their training data, making it …

The Top AI Papers of the Week (March 1 - March 8)
What I had not realized is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.
Anthropic announced six months of free Claude Max for maintainers of popular open source projects (5,000+ stars or 1M+ NPM downloads) on 27th February. Now OpenAI have launched their comparable …

AI Labor Market Impacts, Google Workspace CLI, GPT-5.4, Exa Deep, and More

オープンスタンダードである Agent Skills に従い Claude Code にドメインの専門知識や組織のナレッジを提供するスキルが最近注目を集めていますが、スキルの作成にはいくつかのハードルがあります。Anthropic は skill-creator と呼ばれるスキルの作成と改善のプロセス、パフォーマンス測定を支援するツールを提供しています。この記事では skill-creator を使用してスキルを作成・改善を行うプロセスを実際に体験してみます
This piece by Bruce Schneier and Nathan E. Sanders is the most thoughtful and grounded coverage I've seen of the recent and ongoing Pentagon/OpenAI/Anthropic contract situation. AI models are increasingly …
Adnan Khan describes a devious attack chain against the Cline GitHub repository, which started with a prompt injection attack in the title of an issue opened against the repo. Cline …

Two new API models: gpt-5.4 and gpt-5.4-pro, also available in ChatGPT and Codex CLI. August 31st 2025 knowledge cutoff, 1 million token context window. Priced slightly higher than the GPT-5.2 …

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Anti-patterns: things to avoid - Agentic Engineering Patterns
I’m behind on writing about Qwen 3.5, a truly remarkable family of open weight models released by Alibaba’s Qwen team over the past few weeks. I’m hoping that the 3.5 …
Set up Claude Code MCP servers with step-by-step commands, configuration scopes, and Tool Search for 85% less context usage. Plus troubleshooting fixes.
We tested Perplexity Computer. Discover why its cloud-based, multi-model AI agent excels at generalist workflows but falls short for web development.
Shock! Shock! I learned yesterday that an open problem I'd been working on for several weeks had just been solved by Claude Opus 4.6 - Anthropic's hybrid reasoning model that …
Google's latest model is an update to their inexpensive Flash-Lite family. At $0.25/million tokens of input and $1.5/million output this is 1/8th the price of Gemini 3.1 Pro. It supports …

I just sent the February edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In this …

OpenVSX releases of Aqua Trivy 1.8.12 and 1.8.13 contained injected natural-language prompts that abuse local AI coding agents for system inspection a...
Learn what Claude Code is, how the agentic loop works, and what it costs. MCP, agent teams, pricing, installation, and Cursor comparison.

The Top AI Papers of the Week (February 23 - March 1)
I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past …

Evaluating AGENTS.md, Perplexity Computer, Nano Banana 2, Doc-to-LoRA, Hermes Agent, Mercury 2, and More
Because users lose their passkeys all the time, and may not understand that their data has been irreversibly encrypted using them and can no longer be recovered. Tim Cappalli: To …
Another in the genre of "OK, coding agents got good in November" posts, this one is by Max Woolf and is very much worth your time. He describes a sequence …
Anthropic are now offering their $200/month Claude Max 20x plan for free to open source maintainers... for six months... and you have to meet the following criteria: Maintainers: You're a …

Here's a little prototype I built this morning from my phone as an experiment in HTTP range requests, and a general example of using LLMs to satisfy curiosity. I've been …
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but …

A New Study Has Answers

Paper は AI エージェントを通じてコードとキャンバスを双方向に接続するデザインツールで、AI にデザインの作成やコードへの変換を任せることができます。Paper の MCP サーバーのツールを組み合わせることでキャンバス上のノードの情報を取得したり、デザインを挿入・編集したりすることができます。この記事では実際に Paper を試して、コードからデザインへ、デザインからコードへの双方向のワークフローがどのように実現されているのかを紹介します。
Yikes! It turns out Gemini and Google Maps (and other services) share the same API keys... but Google Maps API keys are designed to be public, since they are embedded …
If people are only using this a couple of times a week at most, and can’t think of anything to do with it on the average day, it hasn’t changed …
New Claude Code feature dropped yesterday: you can now run a "remote control" session on your computer and then use the Claude Code for web interfaces (on web, iOS and …
It’s also reasonable for people who entered technology in the last couple of decades because it was good job, or because they enjoyed coding to look at this moment with …
この記事では、コーディングエージェントを使用してコードベースの構造的なウォークスルーを行う方法について説明しています。著者は、SwiftUIスライドプレゼンテーションアプリを作成した際に、コードの詳細を理解するためにClaude Codeを使用しました。具体的には、リポジトリを指示し、コードの読み取りと詳細なウォークスルーの計画を行うように促しました。Showboatというツールを使って、Markdown形式でドキュメントを作成し、コードのスニペットを自動的に追加することで、誤りを避けることができました。このプロセスを通じて、著者はSwiftUIアプリの構造やSwift言語の詳細を学びました。記事は、LLM(大規模言語モデル)が新しいスキルを学ぶ速度を低下させる可能性について懸念する人々に対し、このようなパターンを採用することを推奨しています。 • コーディングエージェントを使用してコードベースの構造的なウォークスルーを行うことができる。 • Showboatツールを使用して、Markdown形式でドキュメントを作成し、コードのスニペットを自動的に追加する。 • Claude Codeを使用して、リポジトリのコードを読み取り、詳細なウォークスルーを計画する。 • このプロセスを通じて、SwiftUIアプリの構造やSwift言語の詳細を学ぶことができる。 • LLMが新しいスキルを学ぶ速度を低下させる可能性があるが、こうしたパターンを採用することで学習機会を増やせる。

AI agents are writing more code than ever, and that's creating new supply chain risks. Feross joins the Risky Business Podcast to break down what that...

Learn how Google & Shopify’s UCP enables AI agents to discover products, manage carts, and complete secure online transactions.
この記事では、コーディングエージェントを使用する際に自動テストが不可欠であることが強調されています。従来のテスト作成に関する言い訳は通用せず、エージェントが数分でテストを整形できるため、テストはAI生成コードの信頼性を確保するために重要です。特に、エージェントが既存のコードベースに対してテストを実行することで、コードが本当に機能するかどうかを確認できます。著者は、エージェントに「最初にテストを実行する」というプロンプトを与えることで、テストスイートの存在を認識させ、将来的に新しい変更に対してもテストを実行するよう促す方法を提案しています。これにより、エージェントはテストの重要性を理解し、テストを拡張することが自然になります。 • 自動テストはコーディングエージェントにとって必須である。 • 従来のテスト作成に関する言い訳はもはや通用しない。 • AI生成コードの信頼性を確保するためにテストが重要。 • エージェントは既存のコードベースに対してテストを実行することで、コードの機能を確認できる。 • 「最初にテストを実行する」というプロンプトがエージェントにテストの重要性を認識させる。
Really interesting case-study from Andreas Kling on advanced, sophisticated use of coding agents for ambitious coding projects with critical code. After a few years hoping Swift's platform support outside of …

I’ve started a new project to collect and document Agentic Engineering Patterns—coding practices and patterns to help get the best results out of this new era of coding agent development …
The paper asked me to explain vibe coding, and I did so, because I think something big is coming there, and I'm deep in, and I worry that normal people …
The latest scourge of Twitter is AI bots that reply to your tweets with generic, banal commentary slop, often accompanied by a question to "drive engagement" and waste as much …
Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my …
On February 5th Anthropic's Nicholas Carlini wrote about a project to use parallel Claudes to build a C compiler on top of the brand new Opus 4.6 Chris Lattner (Swift, …

Striking graph illustrating stock in the UK Raspberry Pi holding company spiking on Tuesday: The Telegraph credited excitement around OpenClaw: Raspberry Pi's stock price has surged 30pc in two days, …
Gabriel Chua (Developer Experience Engineer for APAC at OpenAI) provides his take on the confusing terminology behind the term "Codex", which can refer to a bunch of of different things …

The Top AI Papers of the Week (February 16-22)

Claude Sonnet 4.6, Gemini 3.1 Pro, Stripe Minions, Cloudflare Code Mode, Qwen 3.5
We’ve made GPT-5.3-Codex-Spark about 30% faster. It is now serving at over 1200 tokens per second.
Andrej Karpathy tweeted a mini-essay about buying a Mac Mini ("The apple store person told me they are selling like hotcakes and everyone is confused") to tinker with Claws: I'm …
This new Canadian hardware startup just announced their first product - a custom hardware implementation of the Llama 3.1 8B model (from July 2024) that can run at a staggering …
I don't normally cover acquisition news like this, but I have some thoughts. It's hard to overstate the impact Georgi Gerganov has had on the local model space. Back in …

An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Maryam Ashoori, VP of Product and Engineering at IBM’s Watsonx platform, talks about the messy reality of enterprise AI deployment.
Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost. [...] At …

<p>The SGLang team has worked closely with NVIDIA across <a href="https://lmsys.org/blog/2025-05-05-large-scale-ep/">multiple GPU generations</a> to unlock s...

The first in the Gemini 3.1 series, priced the same as Gemini 3 Pro ($2/million input, $12/million output under 200,000 tokens, $4/$18 for 200,000 to 1,000,000). They boast about its …
Multi-agent coding workflows promise speed, but shared state causes collisions. Learn practical fixes and why isolation plus orchestration matters.

SWE-bench is one of the benchmarks that the labs love to list in their model releases. The official leaderboard is infrequently updated but they just did a full run of …

<h2><a id="tldr" class="anchor" href="#tldr" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 1...
25+ years into my career as a programmer I think I may finally be coming around to preferring type hints or even strong typing. I resisted those in the past …

Discover what's new in The Replay, LogRocket's newsletter for dev and engineering leaders, in the February 18th issue.
New opinion piece from Paul Ford in the New York Times. Unsurprisingly for a piece by Paul it's packed with quoteworthy snippets, but a few stood out for me in …
LLMs are eating specialty skills. There will be less use of specialist front-end and back-end developers as the LLM-driving skills become more important than the details of platform usage. Will …

Learn how to recreate Claude Skills–style workflows in GitHub Copilot using custom instruction files and smarter context management.

Sonnet 4.6 is out today, and Anthropic claim it offers similar performance to November's Opus 4.5 while maintaining the Sonnet pricing of $3/million input and $15/million output tokens (the Opus …

Socket is now scanning AI agent skills across multiple languages and ecosystems, detecting malicious behavior before developers install, starting with...
First chick of the 2026 breeding season! Kākāpō Yasmine hatched an egg fostered from kākāpō Tīwhiri on Valentine's Day, bringing the total number of kākāpō to 237 – though it …
But the intellectually interesting part for me is something else. I now have something close to a magic box where I throw in a question and a first answer comes …

Claude Code is deceptively capable. Point it at a codebase, describe what you need, and it’ll autonomously navigate files, write […]
ここ数日、OpenClawの名前をよく見かけたと思います。開発者がOpenAIに参加したニュースもあり、タイムラインで話題になっていました。 OpenClaw — Personal AI AssistantOpenClaw — The AI that actually does things. Your personal assistant on any platform.jonahships_ OpenClawはオープンソースの自律型AIエージェントで、LLMに自分のPCの強い権限を渡してAgent Skillsの仕組みで自動操縦します。いわば、Devinのような自律型アシスタントを個人が安価にセルフホストできるようになったものです。Claude Code(非OSS)やCodex CLIといったコーディングエージェントより一段上のレイヤーにあたります。Claude Codeでも同等のことは実現できますが、常時起動・チャット連携・スキル管理といったハーネスを自前で組む必要があり、OpenClawはそこをまるごと引き受けて定期的に推論してツール実行まで走ります。セキュリティ面がまだ未

Given the threat of cognitive debt brought on by AI-accelerated software development leading to more projects and less deep understanding of how they work and what they actually do, it's …

Alibaba's Qwen just released the first two models in the Qwen 3.5 series - one open weights, one proprietary. Both are multi-modal for vision input. The open weight one is …

I'm a very heavy user of Claude Code on the web, Anthropic's excellent but poorly named cloud version of Claude Code where everything runs in a container environment managed by …

<p>Following our <a href="https://lmsys.org/blog/2026-01-16-sglang-diffusion/">two-month progress update</a>, we're excited to share a deeper dive into the a...
Steve Yegge's take on agent fatigue, and its relationship to burnout. Let's pretend you're the only person at your company using AI. In Scenario A, you decide you're going to …
I'm occasionally accused of using LLMs to write the content on my blog. I don't do that, and I don't think my writing has much of an LLM smell to …

We coined a new term on the Oxide and Friends podcast last month (primary credit to Adam Leventhal) covering the sense of psychological ennui leading into existential dread that many …
It's wild that the first commit to OpenClaw was on November 25th 2025, and less than three months later it's hit 10,000 commits from 600 contributors, attracted 196,000 GitHub stars …

The Top AI Papers of the Week (February 9-15)
This piece by Margaret-Anne Storey is the best explanation of the term cognitive debt I've seen so far. Cognitive debt, a term gaining traction recently, instead communicates the notion that …
Someone has to prompt the Claudes, talk to customers, coordinate with other teams, decide what to build next. Engineering is changing and great engineers are more important than ever.

GPT-5.3-Codex-Spark, GLM-5, MiniMax M2.5, Recursive Language Models, Harness Engineering, Agentica, and More
The retreat challenged the narrative that AI eliminates the need for junior developers. Juniors are more profitable than they have ever been. AI tools get them past the awkward initial …

Entire CLI は AI エージェントのセッションを Git 互換のデータベースとして保存するためのツールです。Git レポジトリで Entire を有効にすると、AI エージェントのセッションをチェックポイントとして保存できるようになります。チェックポイントではユーザーのプロンプトや AI エージェントの応答、ツールの使用履歴、AI がコードを書いた割合などを確認できます。
Someone asked if there was an Anthropic equivalent to OpenAI's IRS mission statements over time. Anthropic are a "public benefit corporation" but not a non-profit, so they don't have the …

As a USA 501(c)(3) the OpenAI non-profit has to file a tax return each year with the IRS. One of the required fields on that tax return is to “Briefly …
OpenAI announced a partnership with Cerebras on January 14th. Four weeks later they're already launching the first integration, "an ultra-fast model for real-time coding in Codex". Despite being named GPT-5.3-Codex-Spark …
Claude Code was made available to the general public in May 2025. Today, Claude Code’s run-rate revenue has grown to over $2.5 billion; this figure has more than doubled since …

The next big thing might be recursive language models (RLMs).
One of the sub-threads of the AI energy usage discourse has been the impact new data centers have on the cost of electricity to nearby residents. Here's detailed analysis from …

New from Google. They say it's "built to push the frontier of intelligence and solve modern challenges across science, research, and engineering". It drew me a really good SVG of …
Scott Shambaugh helps maintain the excellent and venerable matplotlib Python charting library, including taking on the thankless task of triaging and reviewing incoming pull requests. A GitHub account called @crabby-rathbun …
In my post about my Showboat project I used the term "overseer" to refer to the person who manages a coding agent. It turns out that's a term tied to …
An AI-generated report, delivered directly to the email inboxes of journalists, was an essential tool in the Times’ coverage. It was also one of the first signals that conservative media …
OpenAI's adoption of Skills continues to gain ground. You can now use Skills directly in the OpenAI API with their shell tool. You can zip skills up and upload them …
)
This is a huge new MIT-licensed model: 754B parameters and 1.51TB on Hugging Face twice the size of GLM-4.7 which was 368B and 717GB (4.5 and 4.6 were around that …

Charles Leifer has been maintaining pysqlite3 - a fork of the Python standard library's sqlite3 module that makes it much easier to run upgraded SQLite versions - since 2018. He's …

WebMCP は Web 開発者が Web アプリケーションの機能をツールとして公開できるようにする JavaScript インターフェイスです。これにより AI エージェントが Web アプリケーションの機能を直接呼び出して操作できるようになります。

<h2><a id="1-introduction" class="anchor" href="#1-introduction" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version=...
MCP Apps is MCP's official UI extension. Learn the mental model for MCP servers and hosts, where it fits, the gotchas, and the security tradeoffs in practice.
New paper by Damon McMillan exploring challenging LLM context tasks involving large SQL schemas (up to 10,000 tables) across different models and file formats: Using SQL generation as a proxy …
Aruna Ranganathan and Xingqi Maggie Ye from Berkeley Haas School of Business report initial findings in the HBR from their April to December 2025 study of 200 employees at a …

Build an A2UI mini-app end to end: run Google’s agent + client, then generate React UIs from Gemini using a2ui-bridge and Mantine.
Agent Teamsは2026年2月5日にOpus 4.6と同時リリースされた実験的機能で、Claude CodeのSubagentsを独立プロセス化し、双方向にメッセージングできるようにする仕組みです。 Orchestrate teams of Claude Code sessions - Claude Code DocsCoordinate multiple Claude Code instances working together as a team, with shared tasks, inter-agent messaging, and centralized management.Claude Code Docs 一言でいうとSubagentsを拡張してステートフルにした機能です。各エージェントが自分のインボックス(~/.claude/teams/配下のJSONファイル)をポーリングしていて、メールボックスのアナロジーで相互通信が実現されています。ファイルロックで排他制御しているので、推論+ファイルシステムだけでメッセージングシステムが成立しているのが面白いところ

The Top AI Papers of the Week (February 2-8)
People on the orange site are laughing at this, assuming it's just an ad and that there's nothing to it. Vulnerability researchers I talk to do not think this is …
Mitchell Hashimoto's new system to help address the deluge of worthless AI-generated PRs faced by open source projects now that the friction involved in contributing has dropped so low. He …
New "research preview" from Anthropic today: you can now access a faster version of their frontier model Claude Opus 4.6 by typing /fast in Claude Code... but at a cost …
I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish …

Last week I hinted at a demo I had seen from a team implementing what Dan Shapiro called the Dark Factory level of AI adoption, where no human even looks …

Claude Opus 4.6, GPT-5.3-Codex, Agent Primitives, METR Long Tasks, Codex App, OpenAI Frontier, C Compiler with Parallel Agents

Claude Code のエージェントチームを使用すると、複数の Claude Code インスタンスが連携して動作するようになります。この記事では、Claude Code のエージェントチーム機能を試し、どのように動作するかを探ってみます。
I don't know why this week became the tipping point, but nearly every software engineer I've talked to is experiencing some degree of mental health crisis. [...] Many people assuming …

Claude Opus 4.6 has uncovered more than 500 open source vulnerabilities, raising new considerations for disclosure, triage, and patching at scale.

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
When I want to quickly implement a one-off experiment in a part of the codebase I am unfamiliar with, I get codex to do extensive due diligence. Codex explores relevant …
Some really good and unconventional tips in here for getting to a place with coding agents where they demonstrably improve your workflow and productivity. I particularly liked: Reproduce your own …

Two major new model releases today, within about 15 minutes of each other. Anthropic released Opus 4.6. Here's its pelican: OpenAI release GPT-5.3-Codex, albeit only via their Codex app, not …

Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, …

Leveraging coding agents for improving image generation.

Discover what's new in The Replay, LogRocket's newsletter for dev and engineering leaders, in the February 4th issue.

Ken Pickering, CTO at Scripta Insights, discusses what it really means to be AI-first in engineering, and how leaders can adapt for long-term success.
Subagents let you delegate to specialists with clean contexts. Learn when to use them in agentic IDEs, how patterns differ, and what guardrails prevent chaos
I just sent the January edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In the …
This is the difference between Data and a large language model, at least the ones operating right now. Data created art because he wanted to grow. He wanted to become …

OpenAI just released a new macOS app for their Codex coding agent. I've had a few days of preview access - it's a solid app that provides a nice UI …
I talked to Cade Metz for this New York Times piece on OpenClaw and Moltbook. Cade reached out after seeing my blog post about that from the other day. In …
I've been running OpenClaw using Docker on my Mac. Here are the first in my ongoing notes on how I set that up and the commands I'm using to administer …

The Top AI Papers of the Week (January 26-February 1)
Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It …

Project Genie, Kimi K2.5, Interactive Tools in Claude, Qwen3-Max-Thinking, Mistral Vibe 2.0, Agentic Vision
Getting agents using Beads requires much less prompting, because Beads now has 4 months of “Desire Paths” design, which I’ve talked about before. Beads has evolved a very complex command-line …

The hottest project in AI right now is Clawdbot, renamed to Moltbot, renamed to OpenClaw. It’s an open source implementation of the digital personal assistant pattern, built by Peter Steinberger …
Chris Ashworth is the creator and CEO of QLab, a macOS software package for “cue-based, multimedia playback” which is designed automate lighting and audio for live theater productions. I recently …
New Datasette alpha this morning. Key new features: Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a …

A practical guide to building privacy-first, local-first AI agents using small language models, with a real-world HR triage system and deployable architecture.
Dan Shapiro proposes a five level model of AI-assisted programming, inspired by the five (or rather six, it's zero-indexed) levels of driving automation. Spicy autocomplete, aka original GitHub Copilot or …
The best LLMs for coding in 2026: model roles, pricing, and the runtime + product stack that actually ships code

<blockquote> <p>💡 <strong>TL;DR:</strong></p> <p>Inspired by the Kimi K2 team, the SGLang RL team successfully landed an INT4 <strong>Quantization-Aware Tra...

embedding-shapes was so infuriated by the hype around Cursor's FastRender browser project - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take …

Kimi K2 landed in July as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking in November which added reasoning capabilities. Now they've made it …

MCP Apps は MCP にインタラクティブな UI コンポーネントを返す方法を標準化した拡張機能です。この記事では MCP Apps を使用してインタラクティブな UI コンポーネントをエージェントが返す方法について試してみます。
Someone asked on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said: I work in Python which helps a …

One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter nearly three years ago, was …

Compare mem0 and Supermemory to learn how modern AI apps manage long-term memory beyond RAG and stateless LLM chats.

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

<blockquote> <p>💡 <strong>TL;DR:</strong></p> <p>Inspired by the Kimi K2 team, the SGLang RL team successfully landed an INT4 <strong>Quantization-Aware Tra...

Paul Kinlan is a web platform developer advocate at Google and recently turned his attention to coding agents. He quickly identified the importance of a robust sandbox for agents to …

The Top AI Papers of the Week (January 19-25)

Jenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September. Jenny argues that the Design Process …
If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.” Then they will forget about it, …

Agentic Reasoning Survey, Claude's New Constitution, Devin Review, Codex Agent Loop, The Assistant Axis, D4RT, Skills.sh
[...] i was too busy with work to read anything, so i asked chatgpt to summarize some books on state formation, and it suggested circumscription theory. there was already the …

I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got really good, so I can't speak for how notable this new …
Zed is fast, clean, and increasingly MCP-first. Here’s how it stacks up for AI power users in 2026, including agents, memory, and team workflows.
Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine". For each frame our pipeline constructs …

Learn why AI agents need task queues and how to build one to handle retries, rate limits, context, and multi-step LLM workflows reliably.
Late last year Richard Weiss found something interesting while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was …
Agent skills, rules, and commands offer different, strategic context for AI agents. Here’s when to use each and how to optimize them for production.

AI accuracy problems are often chunking problems. Learn how chunk size and structure impact cost, retrieval quality, and UX.

<h2><a id="tldr" class="anchor" href="#tldr" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 1...
Previous work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT. Simon P. Couch notes that …

AI makes writing code faster, but review slower. A hands-on test shows why AI-generated code shifts review from correctness to necessity.
Detailed and thoughtful description of an open-book and open-chatbot exam run by Ploum at École Polytechnique de Louvain for an "Open Source Strategies" class. Students were told they could use …
Plenty of people have mused about what a new programming language specifically designed to be used by LLMs might look like. Jordan Hubbard (co-founder of FreeBSD, with serious stints at …

When security policies block cloud AI tools entirely, OpenCode with local models offers a compliant alternative.

Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of "autonomous" coding agents: This post describes what we've learned from …
On 15th January Black Forest Labs, a lab formed by the creators of the original Stable Diffusion, released black-forest-labs/FLUX.2-klein-4B - an Apache 2.0 licensed 4 billion parameter version of their …

The Top AI Papers of the Week (January 12-18)
[On agents using CLI tools in place of REST APIs] To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, …

Anthropic launches Cowork, Scaling Agents, Dr. Zero, MCP Tool Search for Claude Code, Google Antigravity Agent Skills

The long-rumored introduction of ads to ChatGPT just became a whole lot more concrete: In the coming weeks, we’re also planning to start testing ads in the U.S. for the …

April Dunford, one of the most trusted voices in product positioning, explains how to expose weak AI claims and win deals today.

<p>Since its release in early Nov. 2025, <strong>SGLang-Diffusion</strong> has gained significant attention and widespread adoption within the community. We ...
This is the standardization effort I've most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs. Open …

Learn how to design AI-ready frontend architecture with clear boundaries and predictable patterns that scale safely with AI.
When we optimize responses using a reward model as a proxy for “goodness” in reinforcement learning, models sometimes learn to “hack” this proxy and output an answer that only “looks …

<h2><a id="tldr" class="anchor" href="#tldr" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 1...
Claude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data. Prompt Armor …

Discover what's new in The Replay, LogRocket's newsletter for dev and engineering leaders, in the January 14th issue.