Simon Willison's Blog

Simon Willison's Blog

simonwillison.net/
394
Articles
12月13日 13:01
Last updated
No Image

Quoting OpenAI Codex CLI

How to use a skill (progressive disclosure): After deciding to use a skill, open its SKILL.md. Read only enough to follow the workflow. If SKILL.md points to extra folders such …

Simon Willison's Blog
api tool
OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

One of the things that most excited me about Anthropic’s new Skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just …

Simon Willison's Blog
api tool
No Image

LLM 0.28

I released a new version of my LLM Python library and CLI tool for interacting with Large Language Models. Highlights from the release notes: New OpenAI models: gpt-5.1, gpt-5.1-chat-latest, gpt-5.2 …

Simon Willison's Blog
api library tool
GPT-5.2

GPT-5.2

OpenAI reportedly declared a “code red” on the 1st of December in response to increasingly credible competition from the likes of Google’s Gemini 3. It’s less than two weeks later …

Simon Willison's Blog
platform
No Image

The Normalization of Deviance in AI

This thought-provoking essay from Johann Rehberger directly addresses something that I’ve been worrying about for quite a while: in the absence of any headline-grabbing examples of prompt injection vulnerabilities causing …

Simon Willison's Blog
security
Dark mode

Dark mode

I've never been particularly invested dark v.s. light mode but I get enough people complaining that this site is "blinding" that I decided to see if Claude Code for web …

Simon Willison's Blog
css tool ui
Devstral 2

Devstral 2

Two new models from Mistral today: Devstral 2 and Devstral Small 2 - both focused on powering coding agents such as Mistral's newly released Mistral Vibe which I wrote about …

Simon Willison's Blog
tool
Under the hood of Canada Spends with Brendan Samek

Under the hood of Canada Spends with Brendan Samek

I talked to Brendan Samek about Canada Spends, a project from Build Canada that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom …

Simon Willison's Blog
api database tool
No Image

Agentic AI Foundation

Announced today as a new foundation under the parent umbrella of the Linux Foundation (see also the OpenJS Foundation, Cloud Native Computing Foundation, OpenSSF and many more). The AAIF was …

Simon Willison's Blog
platform
mistralai/mistral-vibe

mistralai/mistral-vibe

Here's the Apache 2.0 licensed source code for Mistral's new "Vibe" CLI coding agent, released today alongside Devstral 2. It's a neat implementation of the now standard terminal coding agent …

Simon Willison's Blog
library tool
No Image

Quoting Claude

I found the problem and it's really bad. Looking at your log, here's the catastrophic command that was run: rm -rf tests/ patches/ plan/ ~/ See that ~/ at the …

Simon Willison's Blog
security
No Image

Prediction: AI will make formal verification go mainstream

Martin Kleppmann makes the case for formal verification languages (things like Dafny, Nagini, and Verus) to finally start achieving more mainstream usage. Code generated by LLMs can benefit enormously from …

Simon Willison's Blog
platform
No Image

Quoting Cory Doctorow

Now I want to talk about how they're selling AI. The growth narrative of AI is that AI will disrupt labor markets. I use "disrupt" here in its most disreputable, …

Simon Willison's Blog
platform
No Image

Using LLMs at Oxide

Thoughtful guidance from Bryan Cantrill, who evaluates applications of LLMs against Oxide's core values of responsibility, rigor, empathy, teamwork, and urgency.

Simon Willison's Blog
platform
No Image

Quoting David Crespo

What to try first? Run Claude Code in a repo (whether you know it well or not) and ask a question about how something works. You'll see how it looks …

Simon Willison's Blog
api tool
No Image

The Unexpected Effectiveness of One-Shot Decompilation with Claude

Chris Lewis decompiles N64 games. He wrote about this previously in Using Coding Agents to Decompile Nintendo 64 Games, describing his efforts to decompile Snowboard Kids 2 (released in 1999) …

Simon Willison's Blog
tool
No Image

Quoting Daniel Lemire

If you work slowly, you will be more likely to stick with your slightly obsolete work. You know that professor who spent seven years preparing lecture notes twenty years ago? …

Simon Willison's Blog
tool
No Image

The Resonant Computing Manifesto

Launched today at WIRED’s The Big Interview event, this manifesto (of which I'm a founding signatory) pushes for a positive framework for thinking about building hyper-personalized AI-powered software. This part …

Simon Willison's Blog
api tool
No Image

Anthropic acquires Bun

Anthropic just acquired the company behind the Bun JavaScript runtime, which they adopted for Claude Code just in July. Their announcement includes an impressive revenue update on Claude Code: In …

Simon Willison's Blog
api runtime tool
Introducing Mistral 3

Introducing Mistral 3

Four new models from Mistral today: three in their "Ministral" smaller model series (14B, 8B, and 3B) and a new Mistral Large 3 MoE model with 675B parameters, 41B active. …

Simon Willison's Blog
platform
No Image

Claude 4.5 Opus' Soul Document

Richard Weiss managed to get Claude 4.5 Opus to spit out this 14,000 token document which Claude called the "Soul overview". Richard says: While extracting Claude 4.5 Opus' system message …

Simon Willison's Blog
platform
DeepSeek-V3.2

DeepSeek-V3.2

Two new open weight (MIT licensed) models from DeepSeek today: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, both 690GB, 685B parameters. Here's the PDF tech report. DeepSeek-V3.2 is DeepSeek's new flagship model, now running …

Simon Willison's Blog
platform
No Image

I sent out my November sponsor newsletter

I just send out the November edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy here. …

Simon Willison's Blog
podcast tool youtube
No Image

Quoting Felix Nolan

I am increasingly worried about AI in the video game space in general. [...] I'm not sure that the CEOs and the people making the decisions at these sorts of …

Simon Willison's Blog
platform
No Image

ChatGPT is three years old today

It's ChatGPT's third birthday today. It's fun looking back at Sam Altman's low key announcement thread from November 30th 2022: today we launched ChatGPT. try talking with it here: chat.openai.com …

Simon Willison's Blog
platform
No Image

Context plumbing

Matt Webb coins the term context plumbing to describe the kind of engineering needed to feed agents the right context at the right time: Context appears at disparate sources, by …

Simon Willison's Blog
platform
No Image

Quoting Wikipedia content guideline

Large language models (LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia …

Simon Willison's Blog
api tool
No Image

A ChatGPT prompt equals about 5.1 seconds of Netflix

In June 2025 Sam Altman claimed about ChatGPT that "the average query uses about 0.34 watt-hours". In March 2020 George Kamiya of the International Energy Agency estimated that "streaming a …

Simon Willison's Blog
api cloud tool
Bluesky Thread Viewer thread by @simonwillison.net

Bluesky Thread Viewer thread by @simonwillison.net

I've been having a lot of fun hacking on my Bluesky Thread Viewer JavaScript tool with Claude Code recently. Here it renders a thread (complete with demo video) talking about …

Simon Willison's Blog
api tool
No Image

Quoting Qwen3-VL Technical Report

To evaluate the model’s capability in processing long-context inputs, we construct a video “Needle-in- a-Haystack” evaluation on Qwen3-VL-235B-A22B-Instruct. In this task, a semantically salient “needle” frame—containing critical visual evidence—is inserted …

Simon Willison's Blog
platform
No Image

deepseek-ai/DeepSeek-Math-V2

New on Hugging Face, a specialist mathematical reasoning LLM from DeepSeek. This is their entry in the space previously dominated by proprietary models from OpenAI and Google DeepMind, both of …

Simon Willison's Blog
platform
No Image

Google Antigravity Exfiltrates Data

PromptArmor demonstrate a concerning prompt injection chain in Google's new Antigravity IDE: In this attack chain, we illustrate that a poisoned web source (an integration guide) can manipulate Gemini into …

Simon Willison's Blog
api security tool
No Image

Constant-time support lands in LLVM: Protecting cryptographic code at the compiler level

Substantial LLVM contribution from Trail of Bits. Timing attacks against cryptography algorithms are a gnarly problem: if an attacker can precisely time a cryptographic algorithm they can often derive details …

Simon Willison's Blog
compiler tool
No Image

llm-anthropic 0.23

New plugin release adding support for Claude Opus 4.5, including the new thinking_effort option: llm install -U llm-anthropic llm -m claude-opus-4.5 -o thinking_effort low 'muse on pelicans' This took longer …

Simon Willison's Blog
tool
LLM SVG Generation Benchmark

LLM SVG Generation Benchmark

Here's a delightful project by Tom Gally, inspired by my pelican SVG benchmark. He asked Claude to help create more prompts of the form Generate an SVG of [A] [doing] …

Simon Willison's Blog
platform
No Image

Quoting Claude Opus 4.5 system prompt

If the person is unnecessarily rude, mean, or insulting to Claude, Claude doesn't need to apologize and can insist on kindness and dignity from the person it’s talking with. Even …

Simon Willison's Blog
platform
Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Anthropic released Claude Opus 4.5 this morning, which they call “best model in the world for coding, agents, and computer use”. This is their attempt to retake the crown for …

Simon Willison's Blog
api library tool
No Image

Agent design is still hard

Armin Ronacher presents a cornucopia of lessons learned from building agents over the past few months. There are several agent abstraction libraries available now (my own LLM library is edging …

Simon Willison's Blog
api tool
Olmo 3 is a fully open LLM

Olmo 3 is a fully open LLM

Olmo is the LLM series from Ai2—the Allen institute for AI. Unlike most open weight models these are notable for including the full training data, training process and checkpoints along …

Simon Willison's Blog
library tool
Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model

Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model

Hot on the heels of Tuesday’s Gemini 3 Pro release, today it’s Nano Banana Pro, also known as Gemini 3 Pro Image. I’ve had a few days of preview access …

Simon Willison's Blog
api tool
No Image

Quoting Nicholas Carlini

Previously, when malware developers wanted to go and monetize their exploits, they would do exactly one thing: encrypt every file on a person's computer and request a ransome to decrypt …

Simon Willison's Blog
security
Building more with GPT-5.1-Codex-Max

Building more with GPT-5.1-Codex-Max

Hot on the heels of yesterday's Gemini 3 Pro release comes a new model from OpenAI called GPT-5.1-Codex-Max. (Remember when GPT-5 was meant to bring in a new era of …

Simon Willison's Blog
api tool
No Image

llm-gemini 0.27

New release of my LLM plugin for Google's Gemini models: Support for nested schemas in Pydantic, thanks Bill Pugh. #107 Now tests against Python 3.14. Support for YouTube URLs as …

Simon Willison's Blog
api tool
MacWhisper has Automatic Speaker Recognition now

MacWhisper has Automatic Speaker Recognition now

Inspired by this conversation on Hacker News I decided to upgrade MacWhisper to try out NVIDIA Parakeet and the new Automatic Speaker Recognition feature. It appears to work really well! …

Simon Willison's Blog
api tool
Google Antigravity

Google Antigravity

Google's other major release today to accompany Gemini 3 Pro. At first glance Antigravity is yet another VS Code fork Cursor clone - it's a desktop application you install that …

Simon Willison's Blog
api tool
No Image

Quoting Ethan Mollick

Three years ago, we were impressed that a machine could write a poem about otters. Less than 1,000 days later, I am debating statistical methodology with an agent that built …

Simon Willison's Blog
platform
Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark

Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark

Google released Gemini 3 Pro today. Here’s the announcement from Sundar Pichai, Demis Hassabis, and Koray Kavukcuoglu, their developer blog announcement from Logan Kilpatrick, the Gemini 3 Pro Model Card, …

Simon Willison's Blog
api cloud tool
No Image

The fate of “small” open source

Nolan Lawson asks if LLM assistance means that the category of tiny open source libraries like his own blob-util is destined to fade away. Why take on additional supply chain …

Simon Willison's Blog
api library tool
No Image

Quoting Andrej Karpathy

With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward …

Simon Willison's Blog
platform
No Image

llm-anthropic 0.22

New release of my llm-anthropic plugin: Support for Claude's new structured outputs feature for Sonnet 4.5 and Opus 4.1. #54 Support for the web search tool using -o web_search 1 …

Simon Willison's Blog
api tool
No Image

parakeet-mlx

Neat MLX project by Senstella bringing NVIDIA's Parakeet ASR (Automatic Speech Recognition, like Whisper) model to to Apple's MLX framework. It's packaged as a Python CLI tool, so you can …

Simon Willison's Blog
tool
No Image

GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum

I was confused about whether the new "adaptive thinking" feature of GPT-5.1 meant they were moving away from the "router" mechanism where GPT-5 in ChatGPT automatically selected a model for …

Simon Willison's Blog
platform
Introducing GPT-5.1 for developers

Introducing GPT-5.1 for developers

OpenAI announced GPT-5.1 yesterday, calling it a smarter, more conversational ChatGPT. Today they've added it to their API. We actually got four new models today: gpt-5.1 gpt-5.1-chat-latest gpt-5.1-codex gpt-5.1-codex-mini There …

Simon Willison's Blog
api tool
Nano Banana can be prompt engineered for extremely nuanced AI image generation

Nano Banana can be prompt engineered for extremely nuanced AI image generation

Max Woolf provides an exceptional deep dive into Google's Nano Banana aka Gemini 2.5 Flash Image model, still the best available image manipulation LLM tool three months after its initial …

Simon Willison's Blog
tool
No Image

Quoting Nov 12th letter from OpenAI to Judge Ona T. Wang

On Monday, this Court entered an order requiring OpenAI to hand over to the New York Times and its co-plaintiffs 20 million ChatGPT user conversations [...] OpenAI is unaware of …

Simon Willison's Blog
security
What happens if AI labs train for pelicans riding bicycles?

What happens if AI labs train for pelicans riding bicycles?

Almost every time I share a new example of an SVG of a pelican riding a bicycle a variant of this question pops up: how do you know the labs …

Simon Willison's Blog
platform
No Image

Quoting Steve Krouse

The fact that MCP is a difference surface from your normal API allows you to ship MUCH faster to MCP. This has been unlocked by inference at runtime Normal APIs …

Simon Willison's Blog
api