Last updated: 2025/12/07 21:01
Anthropic Sandbox Runtime (srt)と次世代のAIエージェントのアーキテクチャ
Anthropic Sandbox Runtime (srt) は、Claude Code on the web などクラウド環境向けに Anthropic が開発した軽量サンドボックスの PoC(概念実証)です。 Making Claude Code more secure and autonomous with sandboxingLearn how Claude Code’s new sandboxing feature protects developers with filesystem and network isolation, reducing permission prompts and increasing user safety. 少なくない Claude Code ユーザーは
TypeScript 向けの AI フレームワーク TanStack AI を試してみた
TanStack AI は TanStack チームが開発する TypeScript 向けの軽量な AI フレームワークです。LLM プロバイダーのインターフェイスを抽象化し、ツール呼び出しやチャット機能を提供します。この記事では TanStack AI の概要と基本的な使い方を紹介します。
Claude のプログラミングによるツール呼び出し
MCP ツールの呼び出しはコンテキスト汚染や推論のオーバーヘッドなどの課題があります。Claude のプログラムによるツール呼び出し機能を利用することで、これらの課題を解決する方法について解説します。
The Unexpected Effectiveness of One-Shot Decompilation with Claude
Chris Lewis decompiles N64 games. He wrote about this previously in Using Coding Agents to Decompile Nintendo 64 Games, describing his efforts to decompile Snowboard Kids 2 (released in 1999) …
AI Agents Weekly: OpenRouter State of AI, Mistral 3, DeepSeek-V3.2, Google Workspace Studio, Puppeteer Multi-Agent RL, and more
OpenRouter State of AI, Mistral 3, DeepSeek-V3.2, Google Workspace Studio, Puppeteer Multi-Agent RL, and more
Quoting Daniel Lemire
If you work slowly, you will be more likely to stick with your slightly obsolete work. You know that professor who spent seven years preparing lecture notes twenty years ago? …
企業向けスライド生成AIエージェントをPythonとGPT5で作ってみた
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Codex CLI が Skills をサポート
Codex CLI の最新版v0.65.0 において、experimental ではありますが Skills のサポートが導入されました[1]。 codex/docs/skills.md at main · openai/codexLightweight coding agent that runs in your terminal - openai/codexGitHubopenai [1]: https://github.com/openai/codex/pull/7412 Claude Skills と同じ形式のディレクトリを配置するだけで読み込まれるため、導入の手間はほとんどありません。設定としては、config.toml に次の一行を追加します。 [features] skills = true スキルパッケージは ~/.codex/
Ouroの中間ステップをデコードしてみる
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
The Resonant Computing Manifesto
Launched today at WIRED’s The Big Interview event, this manifesto (of which I'm a founding signatory) pushes for a positive framework for thinking about building hyper-personalized AI-powered software. This part …
A developer’s guide to Antigravity and Gemini 3
Check out Google's latest AI releases, Gemini and the Antigravity AI IDE. Understand what's new, how they work, and how they can reshape your development workflow.
【社内実践】「AI Crazy Shift」で組織はどう変わったか? PM業務30%削減の舞台裏とカルチャー変革
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Support FSDP2 as A Training Backend for Miles
<blockquote> <p><strong>TL;DR:</strong></p> <p><strong>We have added FSDP to <a href="https://github.com/radixark/miles">Miles</a> as a more flexible trainin...
Anthropic acquires Bun
Anthropic just acquired the company behind the Bun JavaScript runtime, which they adopted for Claude Code just in July. Their announcement includes an impressive revenue update on Claude Code: In …
Introducing Mistral 3
Four new models from Mistral today: three in their "Ministral" smaller model series (14B, 8B, and 3B) and a new Mistral Large 3 MoE model with 675B parameters, 41B active. …
生成AI推進者が持つべき3つの心構え
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Claude 4.5 Opus' Soul Document
Richard Weiss managed to get Claude 4.5 Opus to spit out this 14,000 token document which Claude called the "Soul overview". Richard says: While extracting Claude 4.5 Opus' system message …
Boost SGLang Inference: Native NVIDIA Model Optimizer Integration for Seamless Quantization and Deployment
<p>(Updated on Dec 2)</p> <p>We are thrilled to announce a major new feature in SGLang: native support for <a href="https://github.com/NVIDIA/TensorRT-Model-...
DeepSeek-V3.2
Two new open weight (MIT licensed) models from DeepSeek today: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, both 690GB, 685B parameters. Here's the PDF tech report. DeepSeek-V3.2 is DeepSeek's new flagship model, now running …
I sent out my November sponsor newsletter
I just send out the November edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy here. …
From research to production: Accelerate OSS LLM with EAGLE-3 on Vertex
<p><strong>TL;DR:</strong> Speculative decoding boosts LLM inference, but traditional methods require a separate, inefficient draft model. Vertex AI utilizes...
Quoting Felix Nolan
I am increasingly worried about AI in the video game space in general. [...] I'm not sure that the CEOs and the people making the decisions at these sorts of …
ChatGPT is three years old today
It's ChatGPT's third birthday today. It's fun looking back at Sam Altman's low key announcement thread from November 30th 2022: today we launched ChatGPT. try talking with it here: chat.openai.com …
🥇Top AI Papers of the Week
The Top AI Papers of the Week (November 24 - 30)
Claude のツール検索ツールを試してみた
MCP では多くのツール定義が LLM のコンテキストを圧迫する問題があります。Claude のツール検索ツールを使用すると、必要に応じて関連するツールのみを LLM に提供でき、コンテキスト圧迫を軽減できます。この記事では Claude の TypeScript クライアントを使用して、ツール検索ツールを実際に使用した例を紹介します。
The space of minds
On the space of minds and the optimizations that give rise to them.
🤖AI Agents Weekly: Claude Opus 4.5, OmniScientist, FLUX.2, General Agentic Memory
Claude Opus 4.5, OmniScientist, FLUX.2, General Agentic Memory
Context plumbing
Matt Webb coins the term context plumbing to describe the kind of engineering needed to feed agents the right context at the right time: Context appears at disparate sources, by …
Quoting Wikipedia content guideline
Large language models (LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia …
A ChatGPT prompt equals about 5.1 seconds of Netflix
In June 2025 Sam Altman claimed about ChatGPT that "the average query uses about 0.34 watt-hours". In March 2020 George Kamiya of the International Energy Agency estimated that "streaming a …
Bluesky Thread Viewer thread by @simonwillison.net
I've been having a lot of fun hacking on my Bluesky Thread Viewer JavaScript tool with Claude Code recently. Here it renders a thread (complete with demo video) talking about …
Quoting Qwen3-VL Technical Report
To evaluate the model’s capability in processing long-context inputs, we construct a video “Needle-in- a-Haystack” evaluation on Qwen3-VL-235B-A22B-Instruct. In this task, a semantically salient “needle” frame—containing critical visual evidence—is inserted …
deepseek-ai/DeepSeek-Math-V2
New on Hugging Face, a specialist mathematical reasoning LLM from DeepSeek. This is their entry in the space previously dominated by proprietary models from OpenAI and Google DeepMind, both of …
Top 5 AI code review tools in 2025
A hands-on comparison of five AI code review tools – Qodo, Traycer, CodeRabbit, Sourcery, and CodeAnt AI, tested on the same codebase to see which one actually delivers.
You don’t need AI for everything: A reality check for developers
Alexandra Spalato, fractional AI officer, shares a practical framework to help devs decide when and how to use AI and agents.
Programmatic Tool Calling(PTC)の何が新しいのか?
AnthropicがClaude(モデル) APIの新機能として「Programmatic Tool Calling」(以下PTC)を パブリックベータとして公開しました。 Introducing advanced tool use on the Claude Developer PlatformClaude can now discover, learn, and execute tools dynamically to enable agents that take action in the real world. Here’s how. 一言で言うと、これは「ClaudeがToolを呼び出す処理をPythonコードとして生成し、 Anthropicが提供するサンドボックス内で実行する」機能です。 従来のTool Useでは、Toolを1つ呼ぶたびにClaudeが次のアクションを判断し、 その結果をすべてコンテキストウィンドウに追加していました。 10個のToolを連鎖して呼び出すと、10回分の推論と、
Google Antigravity Exfiltrates Data
PromptArmor demonstrate a concerning prompt injection chain in Google's new Antigravity IDE: In this attack chain, we illustrate that a poisoned web source (an integration guide) can manipulate Gemini into …
Constant-time support lands in LLVM: Protecting cryptographic code at the compiler level
Substantial LLVM contribution from Trail of Bits. Timing attacks against cryptography algorithms are a gnarly problem: if an attacker can precisely time a cryptographic algorithm they can often derive details …
Post-hoc Rationalization: LLMの推論は「言い訳」か?
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
llm-anthropic 0.23
New plugin release adding support for Claude Opus 4.5, including the new thinking_effort option: llm install -U llm-anthropic llm -m claude-opus-4.5 -o thinking_effort low 'muse on pelicans' This took longer …
LLM SVG Generation Benchmark
Here's a delightful project by Tom Gally, inspired by my pelican SVG benchmark. He asked Claude to help create more prompts of the form Generate an SVG of [A] [doing] …
'Unified FP8: Moving Beyond Mixed Precision for Stable and Accelerated MoE RL'
<blockquote> <p>TL;DR: We have implemented fully FP8-based sampling and training in RL. Experiments show that for MoE models, the larger the model, the more ...
Quoting Claude Opus 4.5 system prompt
If the person is unnecessarily rude, mean, or insulting to Claude, Claude doesn't need to apologize and can insist on kindness and dignity from the person it’s talking with. Even …
Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult
Anthropic released Claude Opus 4.5 this morning, which they call “best model in the world for coding, agents, and computer use”. This is their attempt to retake the crown for …
'Unified FP8: Moving Beyond Mixed Precision for Stable and Accelerated MoE RL'
<blockquote> <p>TL;DR: We have implemented fully FP8-based sampling and training in RL. Experiments show that for MoE models, the larger the model, the more ...
🥇Top AI Papers of the Week
The Top AI Papers of the Week (November 17 - 23)
Agent design is still hard
Armin Ronacher presents a cornucopia of lessons learned from building agents over the past few months. There are several agent abstraction libraries available now (my own LLM library is edging …
LMSYS Fellowship Program
<p>We're proud to launch the LMSYS Fellowship Program!</p> <p>This year, the program will provide funding to full-time PhD students in the United States who ...
Olmo 3 is a fully open LLM
Olmo is the LLM series from Ai2—the Allen institute for AI. Unlike most open weight models these are notable for including the full training data, training process and checkpoints along …
🤖AI Agents Weekly: Gemini 3, Nano Banana Pro, Antigravity, Agent-R1 RL Framework, Meta's SAM 3, OLMo 3
Gemini 3, Nano Banana Pro, Antigravity, Agent-R1 RL Framework, Meta's SAM 3, OLMo 3
Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model
Hot on the heels of Tuesday’s Gemini 3 Pro release, today it’s Nano Banana Pro, also known as Gemini 3 Pro Image. I’ve had a few days of preview access …
Quoting Nicholas Carlini
Previously, when malware developers wanted to go and monetize their exploits, they would do exactly one thing: encrypt every file on a person's computer and request a ransome to decrypt …
Building more with GPT-5.1-Codex-Max
Hot on the heels of yesterday's Gemini 3 Pro release comes a new model from OpenAI called GPT-5.1-Codex-Max. (Remember when GPT-5 was meant to bring in a new era of …
AntigravityはどういうAIエディタなのか
GoogleがGemini 3と同時にAntigravityという新たなAI コーディングエディタを発表しました。 Google AntigravityGoogle Antigravity - Build the new wayGoogle Antigravity 内部技術としては、Google製のChromiumとV8エンジンを内蔵したElectronというフレームワーク上で動くマイクロソフトのVS Codeフォークである Windsurfの内部のCodeiumエンジンのライセンスを取得したGoogleの独自アプリです(1周した!)。 ただし、表層の実装は魔改造Windsurf寄りでありつつも、プロダクト設計はKiroに近い方向へ振れており、AIエージェントを中心に据えたタスク指向・アーティファクト管理といった独自の抽象レイヤーが全面に押し出されています。 Antigravity の設計で興味深いのは、これが単なる VS Code系フォークではなく、SDD(Spec-Driven Development)的な中間生成物をどう扱うかという“コンテキスト・エンジニアリング”の思想
'Introducing Miles — RL Framework To Fire Up Large-Scale MoE Training'
<blockquote> <p><em>A journey of a thousand miles is made one small step at a time.</em></p> </blockquote> <p>We're excited to introduce Miles, an enterprise...
llm-gemini 0.27
New release of my LLM plugin for Google's Gemini models: Support for nested schemas in Pydantic, thanks Bill Pugh. #107 Now tests against Python 3.14. Support for YouTube URLs as …
MacWhisper has Automatic Speaker Recognition now
Inspired by this conversation on Hacker News I decided to upgrade MacWhisper to try out NVIDIA Parakeet and the new Automatic Speaker Recognition feature. It appears to work really well! …
Google Antigravity
Google's other major release today to accompany Gemini 3 Pro. At first glance Antigravity is yet another VS Code fork Cursor clone - it's a desktop application you install that …
Quoting Ethan Mollick
Three years ago, we were impressed that a machine could write a poem about otters. Less than 1,000 days later, I am debating statistical methodology with an agent that built …
Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark
Google released Gemini 3 Pro today. Here’s the announcement from Sundar Pichai, Demis Hassabis, and Koray Kavukcuoglu, their developer blog announcement from Logan Kilpatrick, the Gemini 3 Pro Model Card, …
データ合成から利用まで: Autonomous AI Database だけでどこまでできるかやってみた
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
The fate of “small” open source
Nolan Lawson asks if LLM assistance means that the category of tiny open source libraries like his own blob-util is destined to fade away. Why take on additional supply chain …
Verifiability
The impact of verifiability on the jagged frontier of LLMs
Real-time AI in Next.js: How to stream responses with the Vercel AI SDK
A practical tutorial on building real-time AI interactions in Next.js. Stream text, show reasoning, handle edge cases, and create a ChatGPT-style UX with the Vercel AI SDK.
Quoting Andrej Karpathy
With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward …
🥇Top AI Papers of the Week
The Top AI Papers of the Week (November 10 - 16)
llm-anthropic 0.22
New release of my llm-anthropic plugin: Support for Claude's new structured outputs feature for Sonnet 4.5 and Opus 4.1. #54 Support for the web search tool using -o web_search 1 …
🤖 AI Agents Weekly: Omnilingual ASR, GPT-5.1, SIMA 2, Context Engineering Whitepaper, Mini-Agent, Marble World Model
Omnilingual ASR, GPT-5.1, SIMA 2, Context Engineering Whitepaper, Mini-Agent, Marble World Model
parakeet-mlx
Neat MLX project by Senstella bringing NVIDIA's Parakeet ASR (Automatic Speech Recognition, like Whisper) model to to Apple's MLX framework. It's packaged as a Python CLI tool, so you can …
GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum
I was confused about whether the new "adaptive thinking" feature of GPT-5.1 meant they were moving away from the "router" mechanism where GPT-5 in ChatGPT automatically selected a model for …
🚀 AutoRound Meets SGLang: Enabling Quantized Model Inference with AutoRound
<h2><a id="overview" class="anchor" href="#overview" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbo...
Introducing GPT-5.1 for developers
OpenAI announced GPT-5.1 yesterday, calling it a smarter, more conversational ChatGPT. Today they've added it to their API. We actually got four new models today: gpt-5.1 gpt-5.1-chat-latest gpt-5.1-codex gpt-5.1-codex-mini There …
Nano Banana can be prompt engineered for extremely nuanced AI image generation
Max Woolf provides an exceptional deep dive into Google's Nano Banana aka Gemini 2.5 Flash Image model, still the best available image manipulation LLM tool three months after its initial …
Quoting Nov 12th letter from OpenAI to Judge Ona T. Wang
On Monday, this Court entered an order requiring OpenAI to hand over to the New York Times and its co-plaintiffs 20 million ChatGPT user conversations [...] OpenAI is unaware of …
What happens if AI labs train for pelicans riding bicycles?
Almost every time I share a new example of an SVG of a pelican riding a bicycle a variant of this question pops up: how do you know the labs …
How I used Mastra to build a prize-winning RAG agent
A developer's retrospective on creating an AI video transcription agent with Mastra, an open-source TypeScript framework for building AI agents.
Quoting Steve Krouse
The fact that MCP is a difference surface from your normal API allows you to ship MUCH faster to MCP. This has been unlocked by inference at runtime Normal APIs …
10 Best AI Tools for Product Managers in 2026
Top 10 AI tools I actually use as a PM: from user calls to PRDs to prototypes. Real workflows, measurable time savings, and honest takes on what works.
Agentic Pelican on a Bicycle
Robert Glaser took my pelican riding a bicycle benchmark and applied an agentic loop to it, seeing if vision models could draw a better pelican if they got the chance …
Six coding agents at once
I've been upgrading a ton of Datasette plugins recently for compatibility with the Datasette 1.0a20 release from last week - 35 so far. A lot of the work is very …
The next phase of dev: Building for MCP and the open web
MCP is the ultimate bridge that redefines how AI connects to the open web. Here's how it lets agents act across APIs and automate workflows.
Quoting Netflix
Netflix asks partners to consider the following guiding principles before leveraging GenAI in any creative workflow: The outputs do not replicate or substantially recreate identifiable characteristics of unowned or copyrighted …
15 Best AI Tools for Designers in 2026
Discover the best AI tools designers are using in 2026 to speed up workflows, generate designs, and connect directly with real design systems.
You’ve authenticated your user, but have you authorized your agent?
Secure AI agents beyond login screens with Auth0’s Auth for GenAI; from token management and human approval to fine-grained authorization.
FTC’s AI chatbot crackdown: A developer compliance guide
Learn how to build a fully compliant AI chatbot with FTC-mandated safeguards – age verification, safety monitoring, consent systems, and audit logging.
Pelican on a Bike - Raytracer Edition
beetle_b ran this prompt against a bunch of recent LLMs: Write a POV-Ray file that shows a pelican riding on a bicycle. This turns out to be a harder challenge …
🥇Top AI Papers of the Week
The Top AI Papers of the Week (November 3 - 9)
Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican
OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It’s currently only available via their Codex CLI tool …
MCP ツールのコンテキスト圧迫の問題とその解決策
MCP の普及に伴い、多数のツール定義が LLM のコンテキストを圧迫する課題が浮上しています。本記事では Progressive disclosure(段階的開示)による最小限の情報提供、MCP を使ったコード実行によるツール呼び出しの効率化、単一の検索ツールによるコンテキスト削減など、実践的な解決策を Claude Skills や Cloudflare Code Mode の事例とともに解説します。
Quoting Kenton Varda
The big advantage of MCP over OpenAPI is that it is very clear about auth. [...] Maybe an agent could read the docs and write code to auth. But we …
🤖 AI Agents Weekly: Context Engineering 2.0, Kimi K2 Thinking, Windsurf Codemaps, Google File Search, Tool-to-Agent Retrieval
Context Engineering 2.0, Kimi K2 Thinking, Windsurf Codemaps, Google File Search, Tool-to-Agent Retrieval
Quoting Josh Cohenzadeh
I have AiDHD It has never been easier to build an MVP and in turn, it has never been harder to keep focus. When new features always feel like they're …
Could LLMs encourage new programming languages?
My hunch is that existing LLMs make it easier to build a new programming language in a way that captures new developers. Most programming languages are similar enough to existing …
Autogen vs. Crew AI: Choosing the right agentic framework
Build autonomous AI agents with Autogen and Crew AI. Learn how agentic AI enables multi-agent systems, tools, and workflows in action.
Using Codex CLI with gpt-oss:120b on an NVIDIA DGX Spark via Tailscale
Inspired by a YouTube comment I wrote up how I run OpenAI's Codex CLI coding agent against the gpt-oss:120b model running in Ollama on my NVIDIA DGX Spark via a …
You should write an agent
Thomas Ptacek on the Fly blog: Agents are the most surprising programming experience I’ve had in my career. Not because I’m awed by the magnitude of their powers — I …
Quoting Ben Stolovitz
My trepidation extends to complex literature searches. I use LLMs as secondary librarians when I’m doing research. They reliably find primary sources (articles, papers, etc.) that I miss in my …
SGLang Diffusion: Accelerating Video and Image Generation
<p>We are excited to introduce SGLang Diffusion, which brings SGLang's state-of-the-art performance to accelerate image and video generation for diffusion mo...
Kimi K2 Thinking
Chinese AI lab Moonshot's Kimi K2 established itself as one of the largest open weight models - 1 trillion parameters - back in July. They've now released the Thinking version, …
Quoting Nathan Lambert
At the start of the year, most people loosely following AI probably knew of 0 [Chinese] AI labs. Now, and towards wrapping up 2025, I’d say all of DeepSeek, Qwen, …
AI dev tool power rankings & comparison [Nov 2025]
Compare the top AI development tools and models of November 2025. View updated rankings, feature breakdowns, and find the best fit for you.
Code research projects with async coding agents like Claude Code and Codex
I’ve been experimenting with a pattern for LLM usage recently that’s working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and …
Fusion 1.0 - The First AI Agent for Product, Design, and Code
Fusion 1.0 is the first AI agent for product, design and code that builds, learns and ships features across your stack from idea to production.
Quoting @belligerentbarbies
I'm worried that they put co-pilot in Excel because Excel is the beast that drives our entire economy and do you know who has tamed that beast? Brenda. Who is …
Code execution with MCP: Building more efficient agents
When I wrote about Claude Skills I mentioned that I don't use MCP at all any more when working with coding agents - I find CLI utilities and libraries like …
MCP Colors: Systematically deal with prompt injection risk
Tim Kellogg proposes a neat way to think about prompt injection, especially with respect to MCP tools. Classify every tool with a color: red if it exposes the agent to …
I tried OpenAI’s AgentKit: Does it make Zapier and n8n obsolete?
Examine AgentKit, Open AI's new tool for building agents. Conduct a side-by-side comparison with n8n by building AI agents with each tool.
A Jarvis for everyone: AI agents as new interfaces
Discover how AI agents and the Model Context Protocol (MCP) are redefining user interfaces, transforming apps into intelligent, conversational systems.
Quoting Steve Francia
Every time an engineer evaluates a language that isn’t “theirs,” their brain is literally working against them. They’re not just analyzing technical trade offs, they’re contemplating a version of themselves …
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'
<p>We are excited to announce day-one support for the new flagship model, MiniMax M2, on SGLang. The MiniMax M2 redefines efficiency for agents: it is a comp...
Quoting MiniMax
Interleaved thinking is essential for LLM agents: it means alternating between explicit reasoning and tool use, while carrying that reasoning forward between steps.This process significantly enhances planning, self‑correction, and reliability …
Optimizing GPT-OSS on NVIDIA DGX Spark: Getting the Most Out of Your Spark
<p>We’ve got some exciting updates about the <strong>NVIDIA DGX Spark</strong>! In the week following the official launch, we collaborated closely with NVIDI...
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'
<p>We are excited to announce day-one support for the new flagship model, MiniMax M2, on SGLang. The MiniMax M2 redefines efficiency for agents: it is a comp...
New prompt injection papers: Agents Rule of Two and The Attacker Moves Second
Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend. Agents Rule of Two: A Practical Approach to AI Agent Security The first is …
PyCon US 2026 call for proposals is now open
PyCon US is coming to the US west coast! 2026 and 2027 will both be held in Long Beach, California - the 2026 conference is set for May 13th-19th next …
🥇Top AI Papers of the Week
The Top AI Papers of the Week (October 27 - November 2)
How I Use Every Claude Code Feature
Useful, detailed guide from Shrivu Shankar, a Claude Code power user. Lots of tips for both individual Claude Code usage and configuring it for larger team projects. I appreciated Shrivu's …
Claude Code Can Debug Low-level Cryptography
Go cryptography author Filippo Valsorda reports on some very positive results applying Claude Code to the challenge of implementing novel cryptography algorithms. After Claude was able to resolve a "fairly …
October 2025 sponsors-only newsletter
I just hit send on the October edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy …
🤖 AI Agents Weekly: MiniMax-M2, Cursor 2.0, SWE-1.5, Agent Data Protocol, Kimi CLI
MiniMax-M2, Cursor 2.0, SWE-1.5, Agent Data Protocol, Kimi CLI
Curiosity-driven blogging
My piece this morning about the Marimo acquisition is an example of a variant of a TIL - I didn't know much about CoreWeave, the acquiring company, so I poked …
Marimo is Joining CoreWeave
I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …
CoreWeave adds Marimo to their 2025 acquisition spree
I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …
Quoting François Chollet
To really understand a concept, you have to "invent" it yourself in some capacity. Understanding doesn't come from passive content consumption. It is always self-built. It is an active, high-agency, …
Introducing SWE-1.5: Our Fast Agent Model
Here's the second fast coding model released by a coding agent IDE in the same day - the first was Composer-1 by Cursor. This time it's Windsurf releasing SWE-1.5: Today …
MiniMax M2 & Agent: Ingenious in Simplicity
MiniMax M2 was released on Monday 27th October by MiniMax, a Chinese AI lab founded in December 2021. It's a very promising model. Their self-reported benchmark scores show it as …
Composer: Building a fast frontier model with RL
Cursor released Cursor 2.0 today, with a refreshed UI focused on agentic coding (and running agents in parallel) and a new model that's unique to Cursor called Composer 1. As far …
The Replay (10/29/25): Tiny AI agents, Next.js 16, and more
Discover what's new in The Replay, LogRocket's newsletter for dev and engineering leaders, in the October 29th issue.
Is Llama really as bad as people say? I put Meta’s AI to the test
Test out Meta's AI model, Llama, on a real CRUD frontend projects, compare it with competing models, and walk through the setup process.
Small language models: Why the future of AI agents might be tiny
Rosario De Chiara discusses why small language models (SLMs) may outperform giants in specific real-world AI systems.
Serena MCPツールを使用したカスタムPlanサブエージェント
はじめに Claude Code v2.0.28のアップデートによりPlan機能がサブエージェント化されました。Plan生成時のコンテキストが切り出され、メインコンテキストの削減に繋がるのが主な利点ですが、この影響でPlanモード実行時に利用されるツールがビルトインツールのみに制限され、MCPサーバーから提供されるツール(Serenaツールを含む)が使用できなくなりました。 Serenaユーザーから「ツールが使われなくなった」という報告を受けた著者はこの問題を特定し、Planエージェントを上書きすることでSerenaツールをサポートする方法を試みました。その結果、うまく動作したので知見を共有します。 💡注意: これは公式にサポートされている方法ではないため、将来のアップデートで動作しなくなる可能性があります。Claude Codeの柔軟なプラグイン機構のおかげで、さまざまなカスタマイズが可能で面白いですね。 この記事では、Serena MCPサーバーのツールを使用したカスタムPlanサブエージェントの使い方を説明します。 概要 このカスタムPlanサブエージェントは、C
SGLang-Jax: An Open-Source Solution for Native TPU Inference
<p>We're excited to introduce SGLang-Jax, a state-of-the-art open-source inference engine built entirely on Jax and XLA. It leverages SGLang's high-performan...
You’re doing vibe coding wrong: Here’s how to do it right
Vibe coding isn’t just AI-assisted chaos. Here’s how to avoid insecure, unreadable code and turn your “vibes” into real developer productivity.
Exploring spec-driven development with the new GitHub Spec Kit
Bring order to AI-assisted coding with GitHub SpecKit — a toolkit for structured, spec-driven development using Copilot, Claude, or Cursor.
Quoting Aaron Boodman
Claude doesn't make me much faster on the work that I am an expert on. Maybe 15-20% depending on the day. It's the work that I don't know how to …
GenAI Image Editing Showdown
Useful collection of examples by Shaun Pedicini who tested Seedream 4, Gemini 2.5 Flash, Qwen-Image-Edit, FLUX.1 Kontext [dev], FLUX.1 Kontext [max], OmniGen2, and OpenAI gpt-image-1 across 12 image editing prompts. …
Sora might have a 'pervert' problem on its hands
Katie Notopoulos turned on the Sora 2 option where anyone can make a video featuring her cameo, and then: I found a stranger had made a video where I appeared …
🥇Top AI Papers of the Week
The Top AI Papers of the Week (October 20-26)
Claude Skills でエージェントに専門的なタスクを実行させる
Claude Skills は Claude が特定のタスクを実行するためのカスタムスキルを作成・共有できる新しい機能です。この記事では、Claude Skills の仕組みと作成方法、MCP ツールとの違いについて解説します。
Setting up a codebase for working with coding agents
Someone on Hacker News asked for tips on setting up a codebase to be more productive with AI coding tools. Here's my reply: Good automated tests which the coding agent …
🤖 AI Agents Weekly: DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser,...
DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser
Quoting Claude Docs
If you have an AGENTS.md file, you can source it in your CLAUDE.md using @AGENTS.md to maintain a single source of truth.
Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding
New model interpretability research from Anthropic, this time focused on SVG and ASCII art generation. We found that the same feature that activates over the eyes in an ASCII face …
claude_code_docs_map.md
Something I'm enjoying about Claude Code is that any time you ask it questions about itself it runs tool calls like these: In this case I'd asked it about its …
Quoting Geoffrey Litt
A lot of people say AI will make us all "managers" or "editors"...but I think this is a dangerously incomplete view! Personally, I'm trying to code like a surgeon. A …
OpenAI no longer has to preserve all of its ChatGPT data, with some exceptions
This is a relief: Federal judge Ona T. Wang filed a new order on October 9 that frees OpenAI of an obligation to "preserve and segregate all output log data …
Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas
My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post …
The Replay (10/22/25): AI-assisted coding, Wasm 3.0, and more
Discover what's new in The Replay, LogRocket's newsletter for dev and engineering leaders, in the October 22nd issue.
Where AI-assisted coding accelerates development — and where it doesn’t
John Reilly discusses how software development has been changed by the innovations of AI: both the positives and the negatives.
Living dangerously with Claude
I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling …
SLOCCount in WebAssembly
This project/side-quest got a little bit out of hand. I remembered an old tool called SLOCCount which could count lines of code and produce an estimate for how much they …
Don't let Claude Code delete your session logs
Claude Code stores full logs of your sessions as newline-delimited JSON in ~/.claude/projects/encoded-directory/*.jsonl on your machine. I currently have 379MB of these! Here's an example jsonl file which I extracted …
Accelerating Hybrid Inference in SGLang with KTransformers CPU Kernels
<h2><a id="background-hybrid-inference-for-sparse-moe-models" class="anchor" href="#background-hybrid-inference-for-sparse-moe-models" aria-hidden="true"><sv...
Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers
The Brave security team wrote about prompt injection against browser agents a few months ago (here are my notes on that). Here's their follow-up: What we’ve found confirms our initial …
Introducing ChatGPT Atlas
Last year OpenAI hired Chrome engineer Darin Fisher, which sparked speculation they might have their own browser in the pipeline. Today it arrived. ChatGPT Atlas is a Mac-only web browser …
TypeScript版DSPy、axを試してみた
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Quoting Bruce Schneier and Barath Raghavan
Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include …
Claude Code for web - a new asynchronous coding agent from Anthropic
Anthropic launched Claude Code for web this morning. It’s an asynchronous coding agent—their answer to OpenAI’s Codex Cloud and Google’s Jules, and has a very similar shape. I had preview …
Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code
DeepSeek released a new model yesterday: DeepSeek-OCR, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running …
Announcing Experimental Malware Scanning for the Hugging Face Ecosystem
Socket is launching experimental protection for the Hugging Face ecosystem, scanning for malware and malicious payload injections inside model files t...
Oracle AI World 2025 参加レポート
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
🥇Top AI Papers of the Week
The Top AI Papers of the Week (October 13-19)
TIL: Exploring OpenAI's deep research API model o4-mini-deep-research
I landed a PR by Manuel Solorzano adding pricing information to llm-prices.com for OpenAI's o4-mini-deep-research and o3-deep-research models, which they released in June and document here. I realized I'd never …
🤖 AI Agents Weekly: Claude Haiku 4.5, Deep Agents, SWE-grep, nanochat, Agent Skills, Veo 3.1 Fast, n8n AI Workflow Builder
Claude Haiku 4.5, Deep Agents, SWE-grep, nanochat, Agent Skills, Veo 3.1 Fast, n8n AI Workflow Builder
The AI water issue is fake
Andy Masley (previously): All U.S. data centers (which mostly support the internet, not AI) used 200--250 million gallons of freshwater daily in 2023. The U.S. consumes approximately 132 billion gallons …
Andrej Karpathy — AGI is still a decade away
Extremely high signal 2 hour 25 minute (!) conversation between Andrej Karpathy and Dwarkesh Patel. It starts with Andrej's claim that "the year of agents" is actually more likely to …
Claude Skillsとは何なのか?
AnthropicがClaudeの新機能 Claude Skills (Agent Skills)を追加したと発表しました。Claude Skillsは、Markdownファイルとスクリプトで構成される「スキルフォルダ」を通じて、モデルに特定の機能や知識を拡張できる仕組みです。 Claude Skills: Customize AI for your workflowsBuild custom Skills to teach Claude specialized tasks. Create once, use everywhere—from spreadsheets to coding. Available across Claude.ai, API, and Code.Box logo もともとClaudeは8月にチャットアシスタントからのコード実行環境をアップデートしていました。それまでは指示に応じてPythonコードを実行しグラフ生成やデータ分析をする用途でしたが、Bashコマンドをサンドボックスで自由に実行できる環境ができていました。 Claude can now cre
Quoting Alexander Fridriksson and Jay Miller
Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit …
Quoting Barry Zhang
Skills actually came out of a prototype I built demonstrating that Claude Code is a general-purpose agent :-) It was a natural conclusion once we realized that bash + filesystem …
Claude Skills are awesome, maybe a bigger deal than MCP
Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills …
ENISA’s 2025 Threat Landscape: AI Reshapes Cyber Attacks, from Phishing to Supply Chain Abuse
ENISA’s 2025 Threat Landscape report highlights how AI is reshaping cyber attacks, driving phishing, model poisoning, and software supply chain risks.
Deep Agents
On the future of AI Agents.
NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0
EXO Labs wired a 256GB M3 Ultra Mac Studio up to an NVIDIA DGX Spark and got a 2.8x performance boost serving Llama-3.1 8B (FP16) with an 8,192 token prompt. …
Quoting Riana Pfefferkorn
Pro se litigants account for the majority of the cases in the United States where a party submitted a court filing containing AI hallucinations. In a country where legal representation …
Coding without typing the code
Last year the most useful exercise for getting a feel for how good LLMs were at writing code was vibe coding (before that name had even been coined) - seeing …
Quoting Catherine Wu
While Sonnet 4.5 remains the default [in Claude Code], Haiku 4.5 now powers the Explore subagent which can rapidly gather context on your codebase to build apps even faster. You …
Introducing Claude Haiku 4.5
Anthropic released Claude Haiku 4.5 today, the cheapest member of the Claude 4.5 family that started with Sonnet 4.5 a couple of weeks ago. It's priced at $1/million input tokens …
Quoting Claude Haiku 4.5 System Card
Previous system cards have reported results on an expanded version of our earlier agentic misalignment evaluation suite: three families of exotic scenarios meant to elicit the model to commit blackmail, …
Want to run your AI model locally? Here’s what you should know
As costs and privacy concerns grow, enterprises are shifting from cloud to local AI. Learn what it takes to run models locally, and why it matters.
LLM-as-a-Judgeにまつわるバイアスまとめ
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
NVIDIA DGX Spark: great hardware, early days for the ecosystem
NVIDIA sent me a preview unit of their new DGX Spark desktop “AI supercomputer”. I’ve never had hardware to review before! You can consider this my first ever sponsored post …
Just Talk To It - the no-bs Way of Agentic Engineering
Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions …
AIに技術記事を書かせる:9回の反復で到達した「完璧すぎる」という逆説
この記事では、AIに技術記事を書かせる試みについて述べられています。著者は、Claude Codeを使用して、記事生成、レビュー、スタイルガイドの改善を繰り返すシステムを構築しました。最初は品質が7〜8割程度と予想していましたが、9回の反復を経て9.0/10の評価に達しました。特に、完璧すぎる記事が逆にAIらしさを感じさせるという「完璧すぎる逆説」に直面しました。システムは3つのエージェント(Writer Agent、Reviewer Agent、Style Guide Updater)で構成され、各エージェントは独立して機能します。反復を重ねる中で、メタ認知的シフトや不完全さの重要性が明らかになり、最終的には人間らしい不完全さを取り入れることで、より自然な記事が生成されるようになりました。 • AIに技術記事を書かせる試みの目的は、人間と区別できないレベルの品質を目指すこと。 • Claude Codeを使用し、記事生成、レビュー、スタイルガイド改善のサイクルを構築。 • 反復を重ねる中で、品質が向上し、最終的に9.0/10の評価を得る。 • 完璧すぎる記事が逆にAIらしさを感じさせるという課題に直面。 • システムは3つのエージェント(Writer、Reviewer、Style Guide Updater)で構成され、各エージェントは独立して機能。 • メタ認知的シフトや不完全さの重要性が明らかになり、自然な記事生成に寄与。 • 不完全さを取り入れることで、より人間らしい記事が生成されるようになった。
NVIDIA and SGLang Accelerating SemiAnalysis InferenceMAX and GB200 Together
<p>The SGLang and NVIDIA teams have a strong track record of collaboration, consistently delivering inference optimizations and system-level improvements to ...
nanochat
Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …
AI dev tool power rankings & comparison [Oct 2025]
Compare the top AI development tools and models of September 2025. View updated rankings, feature breakdowns, and find the best fit for you.
NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference
<p>Thanks to NVIDIA’s early access program, we are thrilled to get our hands on the NVIDIA DGX™ Spark. It’s quite an unconventional system, as NVIDIA rarely ...
🥇Top AI Papers of the Week
The Top AI Papers of the Week (October 6-12)
Claude Code sub-agents
Claude Code includes the ability to run sub-agents, where a separate agent loop with a fresh token context is dispatched to achieve a goal and report back when it's done. …
Vibing a Non-Trivial Ghostty Feature
Mitchell Hashimoto provides a comprehensive answer to the frequent demand for a detailed description of shipping a non-trivial production feature to an existing project using AI-assistance. In this case it's …
🤖 AI Agents Weekly: AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender
AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender
Note on 11th October 2025
I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code …
ChatGPT 内でアプリを直接操作する Apps SDK に自作のアプリを接続する
Apps in ChatGPT は ChatGPT のチャット内で会話の流れに応じて外部のアプリを呼び出し、インタラクティブな操作を可能にする機能です。アプリごとに独自の UI コンポーネントを提供し、ユーザーはチャット画面からシームレスな体験でアプリを操作できます。この記事では Apps SDK を使用して、実際に ChatGPT 内で動作するシンプルなアプリを作成する手順を紹介します。
simonw/claude-skills
One of the tips I picked up from Jesse Vincent's Claude Code Superpowers post (previously) was this: Skills are what give your agents Superpowers. The first time they really popped …
Superpowers: How I'm using coding agents in October 2025
A follow-up to Jesse Vincent's post about September, but this is a really significant piece in its own right. Jesse is one of the most creative users of coding agents …
A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises
Filippo Valsorda surveyed 18 incidents from the past year of open source supply chain attacks, where package updates were infected with malware thanks to a compromise of the project itself. …
Video of GPT-OSS 20B running on a phone
GPT-OSS 20B is a very good model. At launch OpenAI claimed: The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with …
AIエージェントにおけるコンテキスト圧縮手法の評価 (AI Shiftインターン体験記)
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Quoting Gergely Orosz
I get a feeling that working with multiple AI agents is something that comes VERY natural to most senior+ engineers or tech lead who worked at a large company You …
LangChain.js is overrated; Build your AI agent with a simple fetch call
Skip the LangChain.js overhead: How to build a Retrieval-Augmented Generation (RAG) AI agent from scratch using just the native `fetch()` API.