Simon Willison's Blog

Simon Willison's Blog

simonwillison.net/
342
Articles
11月14日 11:01
Last updated
Introducing GPT-5.1 for developers

Introducing GPT-5.1 for developers

OpenAI announced GPT-5.1 yesterday, calling it a smarter, more conversational ChatGPT. Today they've added it to their API. We actually got four new models today: gpt-5.1 gpt-5.1-chat-latest gpt-5.1-codex gpt-5.1-codex-mini There …

Simon Willison's Blog
api tool
Nano Banana can be prompt engineered for extremely nuanced AI image generation

Nano Banana can be prompt engineered for extremely nuanced AI image generation

Max Woolf provides an exceptional deep dive into Google's Nano Banana aka Gemini 2.5 Flash Image model, still the best available image manipulation LLM tool three months after its initial …

Simon Willison's Blog
tool
No Image

Quoting Nov 12th letter from OpenAI to Judge Ona T. Wang

On Monday, this Court entered an order requiring OpenAI to hand over to the New York Times and its co-plaintiffs 20 million ChatGPT user conversations [...] OpenAI is unaware of …

Simon Willison's Blog
security
What happens if AI labs train for pelicans riding bicycles?

What happens if AI labs train for pelicans riding bicycles?

Almost every time I share a new example of an SVG of a pelican riding a bicycle a variant of this question pops up: how do you know the labs …

Simon Willison's Blog
platform
No Image

Quoting Steve Krouse

The fact that MCP is a difference surface from your normal API allows you to ship MUCH faster to MCP. This has been unlocked by inference at runtime Normal APIs …

Simon Willison's Blog
api
Agentic Pelican on a Bicycle

Agentic Pelican on a Bicycle

Robert Glaser took my pelican riding a bicycle benchmark and applied an agentic loop to it, seeing if vision models could draw a better pelican if they got the chance …

Simon Willison's Blog
platform
Six coding agents at once

Six coding agents at once

I've been upgrading a ton of Datasette plugins recently for compatibility with the Datasette 1.0a20 release from last week - 35 so far. A lot of the work is very …

Simon Willison's Blog
tool
No Image

Quoting Netflix

Netflix asks partners to consider the following guiding principles before leveraging GenAI in any creative workflow: The outputs do not replicate or substantially recreate identifiable characteristics of unowned or copyrighted …

Simon Willison's Blog
platform
Pelican on a Bike - Raytracer Edition

Pelican on a Bike - Raytracer Edition

beetle_b ran this prompt against a bunch of recent LLMs: Write a POV-Ray file that shows a pelican riding on a bicycle. This turns out to be a harder challenge …

Simon Willison's Blog
platform
Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It’s currently only available via their Codex CLI tool …

Simon Willison's Blog
api tool
No Image

Quoting Kenton Varda

The big advantage of MCP over OpenAPI is that it is very clear about auth. [...] Maybe an agent could read the docs and write code to auth. But we …

Simon Willison's Blog
api security
No Image

Quoting Josh Cohenzadeh

I have AiDHD It has never been easier to build an MVP and in turn, it has never been harder to keep focus. When new features always feel like they're …

Simon Willison's Blog
api tool
No Image

Could LLMs encourage new programming languages?

My hunch is that existing LLMs make it easier to build a new programming language in a way that captures new developers. Most programming languages are similar enough to existing …

Simon Willison's Blog
library tool
No Image

Using Codex CLI with gpt-oss:120b on an NVIDIA DGX Spark via Tailscale

Inspired by a YouTube comment I wrote up how I run OpenAI's Codex CLI coding agent against the gpt-oss:120b model running in Ollama on my NVIDIA DGX Spark via a …

Simon Willison's Blog
tool
No Image

You should write an agent

Thomas Ptacek on the Fly blog: Agents are the most surprising programming experience I’ve had in my career. Not because I’m awed by the magnitude of their powers — I …

Simon Willison's Blog
platform
No Image

Quoting Ben Stolovitz

My trepidation extends to complex literature searches. I use LLMs as secondary librarians when I’m doing research. They reliably find primary sources (articles, papers, etc.) that I miss in my …

Simon Willison's Blog
platform
Kimi K2 Thinking

Kimi K2 Thinking

Chinese AI lab Moonshot's Kimi K2 established itself as one of the largest open weight models - 1 trillion parameters - back in July. They've now released the Thinking version, …

Simon Willison's Blog
platform
No Image

Quoting Nathan Lambert

At the start of the year, most people loosely following AI probably knew of 0 [Chinese] AI labs. Now, and towards wrapping up 2025, I’d say all of DeepSeek, Qwen, …

Simon Willison's Blog
platform
Code research projects with async coding agents like Claude Code and Codex

Code research projects with async coding agents like Claude Code and Codex

I’ve been experimenting with a pattern for LLM usage recently that’s working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and …

Simon Willison's Blog
api tool
No Image

Quoting @belligerentbarbies

I'm worried that they put co-pilot in Excel because Excel is the beast that drives our entire economy and do you know who has tamed that beast? Brenda. Who is …

Simon Willison's Blog
api tool
No Image

Code execution with MCP: Building more efficient agents

When I wrote about Claude Skills I mentioned that I don't use MCP at all any more when working with coding agents - I find CLI utilities and libraries like …

Simon Willison's Blog
api tool
No Image

MCP Colors: Systematically deal with prompt injection risk

Tim Kellogg proposes a neat way to think about prompt injection, especially with respect to MCP tools. Classify every tool with a color: red if it exposes the agent to …

Simon Willison's Blog
security
No Image

Quoting Steve Francia

Every time an engineer evaluates a language that isn’t “theirs,” their brain is literally working against them. They’re not just analyzing technical trade offs, they’re contemplating a version of themselves …

Simon Willison's Blog
tool
No Image

Quoting MiniMax

Interleaved thinking is essential for LLM agents: it means alternating between explicit reasoning and tool use, while carrying that reasoning forward between steps.This process significantly enhances planning, self‑correction, and reliability …

Simon Willison's Blog
api tool
New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend. Agents Rule of Two: A Practical Approach to AI Agent Security The first is …

Simon Willison's Blog
security
No Image

PyCon US 2026 call for proposals is now open

PyCon US is coming to the US west coast! 2026 and 2027 will both be held in Long Beach, California - the 2026 conference is set for May 13th-19th next …

Simon Willison's Blog
api tool
No Image

How I Use Every Claude Code Feature

Useful, detailed guide from Shrivu Shankar, a Claude Code power user. Lots of tips for both individual Claude Code usage and configuring it for larger team projects. I appreciated Shrivu's …

Simon Willison's Blog
tool
No Image

Claude Code Can Debug Low-level Cryptography

Go cryptography author Filippo Valsorda reports on some very positive results applying Claude Code to the challenge of implementing novel cryptography algorithms. After Claude was able to resolve a "fairly …

Simon Willison's Blog
security tool
No Image

October 2025 sponsors-only newsletter

I just hit send on the October edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy …

Simon Willison's Blog
api tool
No Image

Curiosity-driven blogging

My piece this morning about the Marimo acquisition is an example of a variant of a TIL - I didn't know much about CoreWeave, the acquiring company, so I poked …

Simon Willison's Blog
tool
No Image

CoreWeave adds Marimo to their 2025 acquisition spree

I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …

Simon Willison's Blog
library tool
No Image

Marimo is Joining CoreWeave

I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …

Simon Willison's Blog
library tool
No Image

Quoting François Chollet

To really understand a concept, you have to "invent" it yourself in some capacity. Understanding doesn't come from passive content consumption. It is always self-built. It is an active, high-agency, …

Simon Willison's Blog
tool
Introducing SWE-1.5: Our Fast Agent Model

Introducing SWE-1.5: Our Fast Agent Model

Here's the second fast coding model released by a coding agent IDE in the same day - the first was Composer-1 by Cursor. This time it's Windsurf releasing SWE-1.5: Today …

Simon Willison's Blog
api cloud tool
MiniMax M2 & Agent: Ingenious in Simplicity

MiniMax M2 & Agent: Ingenious in Simplicity

MiniMax M2 was released on Monday 27th October by MiniMax, a Chinese AI lab founded in December 2021. It's a very promising model. Their self-reported benchmark scores show it as …

Simon Willison's Blog
tool
Composer: Building a fast frontier model with RL

Composer: Building a fast frontier model with RL

Cursor released Cursor 2.0 today, with a refreshed UI focused on agentic coding (and running agents in parallel) and a new model that's unique to Cursor called Composer 1. As far …

Simon Willison's Blog
library tool
No Image

Quoting Aaron Boodman

Claude doesn't make me much faster on the work that I am an expert on. Maybe 15-20% depending on the day. It's the work that I don't know how to …

Simon Willison's Blog
platform
No Image

GenAI Image Editing Showdown

Useful collection of examples by Shaun Pedicini who tested Seedream 4, Gemini 2.5 Flash, Qwen-Image-Edit, FLUX.1 Kontext [dev], FLUX.1 Kontext [max], OmniGen2, and OpenAI gpt-image-1 across 12 image editing prompts. …

Simon Willison's Blog
tool
No Image

Sora might have a 'pervert' problem on its hands

Katie Notopoulos turned on the Sora 2 option where anyone can make a video featuring her cameo, and then: I found a stranger had made a video where I appeared …

Simon Willison's Blog
tool
No Image

Setting up a codebase for working with coding agents

Someone on Hacker News asked for tips on setting up a codebase to be more productive with AI coding tools. Here's my reply: Good automated tests which the coding agent …

Simon Willison's Blog
api library tool
No Image

Quoting Claude Docs

If you have an AGENTS.md file, you can source it in your CLAUDE.md using @AGENTS.md to maintain a single source of truth.

Simon Willison's Blog
tool
Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding

Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding

New model interpretability research from Anthropic, this time focused on SVG and ASCII art generation. We found that the same feature that activates over the eyes in an ASCII face …

Simon Willison's Blog
tool
claude_code_docs_map.md

claude_code_docs_map.md

Something I'm enjoying about Claude Code is that any time you ask it questions about itself it runs tool calls like these: In this case I'd asked it about its …

Simon Willison's Blog
api tool
No Image

Quoting Geoffrey Litt

A lot of people say AI will make us all "managers" or "editors"...but I think this is a dangerously incomplete view! Personally, I'm trying to code like a surgeon. A …

Simon Willison's Blog
api tool
No Image

OpenAI no longer has to preserve all of its ChatGPT data, with some exceptions

This is a relief: Federal judge Ona T. Wang filed a new order on October 9 that frees OpenAI of an obligation to "preserve and segregate all output log data …

Simon Willison's Blog
platform
No Image

Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas

My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post …

Simon Willison's Blog
api security tool
Living dangerously with Claude

Living dangerously with Claude

I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling …

Simon Willison's Blog
api tool
SLOCCount in WebAssembly

SLOCCount in WebAssembly

This project/side-quest got a little bit out of hand. I remembered an old tool called SLOCCount which could count lines of code and produce an estimate for how much they …

Simon Willison's Blog
tool
No Image

Don't let Claude Code delete your session logs

Claude Code stores full logs of your sessions as newline-delimited JSON in ~/.claude/projects/encoded-directory/*.jsonl on your machine. I currently have 379MB of these! Here's an example jsonl file which I extracted …

Simon Willison's Blog
tool
Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers

Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers

The Brave security team wrote about prompt injection against browser agents a few months ago (here are my notes on that). Here's their follow-up: What we’ve found confirms our initial …

Simon Willison's Blog
security
Introducing ChatGPT Atlas

Introducing ChatGPT Atlas

Last year OpenAI hired Chrome engineer Darin Fisher, which sparked speculation they might have their own browser in the pipeline. Today it arrived. ChatGPT Atlas is a Mac-only web browser …

Simon Willison's Blog
tool ui
No Image

Quoting Bruce Schneier and Barath Raghavan

Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include …

Simon Willison's Blog
security
Claude Code for web - a new asynchronous coding agent from Anthropic

Claude Code for web - a new asynchronous coding agent from Anthropic

Anthropic launched Claude Code for web this morning. It’s an asynchronous coding agent—their answer to OpenAI’s Codex Cloud and Google’s Jules, and has a very similar shape. I had preview …

Simon Willison's Blog
api tool
Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

DeepSeek released a new model yesterday: DeepSeek-OCR, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running …

Simon Willison's Blog
api tool
TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

I landed a PR by Manuel Solorzano adding pricing information to llm-prices.com for OpenAI's o4-mini-deep-research and o3-deep-research models, which they released in June and document here. I realized I'd never …

Simon Willison's Blog
api tool
No Image

The AI water issue is fake

Andy Masley (previously): All U.S. data centers (which mostly support the internet, not AI) used 200--250 million gallons of freshwater daily in 2023. The U.S. consumes approximately 132 billion gallons …

Simon Willison's Blog
api cloud infra
No Image

Andrej Karpathy — AGI is still a decade away

Extremely high signal 2 hour 25 minute (!) conversation between Andrej Karpathy and Dwarkesh Patel. It starts with Andrej's claim that "the year of agents" is actually more likely to …

Simon Willison's Blog
library tool
No Image

Quoting Alexander Fridriksson and Jay Miller

Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit …

Simon Willison's Blog
api database security
No Image

Quoting Barry Zhang

Skills actually came out of a prototype I built demonstrating that Claude Code is a general-purpose agent :-) It was a natural conclusion once we realized that bash + filesystem …

Simon Willison's Blog
platform
Claude Skills are awesome, maybe a bigger deal than MCP

Claude Skills are awesome, maybe a bigger deal than MCP

Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills …

Simon Willison's Blog
api tool
No Image

NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0

EXO Labs wired a 256GB M3 Ultra Mac Studio up to an NVIDIA DGX Spark and got a 2.8x performance boost serving Llama-3.1 8B (FP16) with an 8,192 token prompt. …

Simon Willison's Blog
framework tool
No Image

Quoting Riana Pfefferkorn

Pro se litigants account for the majority of the cases in the United States where a party submitted a court filing containing AI hallucinations. In a country where legal representation …

Simon Willison's Blog
platform
No Image

Coding without typing the code

Last year the most useful exercise for getting a feel for how good LLMs were at writing code was vibe coding (before that name had even been coined) - seeing …

Simon Willison's Blog
platform
No Image

Quoting Catherine Wu

While Sonnet 4.5 remains the default [in Claude Code], Haiku 4.5 now powers the Explore subagent which can rapidly gather context on your codebase to build apps even faster. You …

Simon Willison's Blog
platform
Introducing Claude Haiku 4.5

Introducing Claude Haiku 4.5

Anthropic released Claude Haiku 4.5 today, the cheapest member of the Claude 4.5 family that started with Sonnet 4.5 a couple of weeks ago. It's priced at $1/million input tokens …

Simon Willison's Blog
platform
No Image

Quoting Claude Haiku 4.5 System Card

Previous system cards have reported results on an expanded version of our earlier agentic misalignment evaluation suite: three families of exotic scenarios meant to elicit the model to commit blackmail, …

Simon Willison's Blog
platform
NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA sent me a preview unit of their new DGX Spark desktop “AI supercomputer”. I’ve never had hardware to review before! You can consider this my first ever sponsored post …

Simon Willison's Blog
cloud tool
No Image

Just Talk To It - the no-bs Way of Agentic Engineering

Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions …

Simon Willison's Blog
api tool