Simon Willison's Blog

Simon Willison's Blog

simonwillison.net/
342
Articles
11月14日 11:01
Last updated
Introducing GPT-5.1 for developers

Introducing GPT-5.1 for developers

OpenAI announced GPT-5.1 yesterday, calling it a smarter, more conversational ChatGPT. Today they've added it to their API. We actually got four new models today: gpt-5.1 gpt-5.1-chat-latest gpt-5.1-codex gpt-5.1-codex-mini There …

Simon Willison's Blog
api tool
Nano Banana can be prompt engineered for extremely nuanced AI image generation

Nano Banana can be prompt engineered for extremely nuanced AI image generation

Max Woolf provides an exceptional deep dive into Google's Nano Banana aka Gemini 2.5 Flash Image model, still the best available image manipulation LLM tool three months after its initial …

Simon Willison's Blog
tool
No Image

Quoting Nov 12th letter from OpenAI to Judge Ona T. Wang

On Monday, this Court entered an order requiring OpenAI to hand over to the New York Times and its co-plaintiffs 20 million ChatGPT user conversations [...] OpenAI is unaware of …

Simon Willison's Blog
security
What happens if AI labs train for pelicans riding bicycles?

What happens if AI labs train for pelicans riding bicycles?

Almost every time I share a new example of an SVG of a pelican riding a bicycle a variant of this question pops up: how do you know the labs …

Simon Willison's Blog
platform
No Image

Quoting Steve Krouse

The fact that MCP is a difference surface from your normal API allows you to ship MUCH faster to MCP. This has been unlocked by inference at runtime Normal APIs …

Simon Willison's Blog
api
Agentic Pelican on a Bicycle

Agentic Pelican on a Bicycle

Robert Glaser took my pelican riding a bicycle benchmark and applied an agentic loop to it, seeing if vision models could draw a better pelican if they got the chance …

Simon Willison's Blog
platform
Six coding agents at once

Six coding agents at once

I've been upgrading a ton of Datasette plugins recently for compatibility with the Datasette 1.0a20 release from last week - 35 so far. A lot of the work is very …

Simon Willison's Blog
tool
No Image

Quoting Netflix

Netflix asks partners to consider the following guiding principles before leveraging GenAI in any creative workflow: The outputs do not replicate or substantially recreate identifiable characteristics of unowned or copyrighted …

Simon Willison's Blog
platform
Pelican on a Bike - Raytracer Edition

Pelican on a Bike - Raytracer Edition

beetle_b ran this prompt against a bunch of recent LLMs: Write a POV-Ray file that shows a pelican riding on a bicycle. This turns out to be a harder challenge …

Simon Willison's Blog
platform
Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It’s currently only available via their Codex CLI tool …

Simon Willison's Blog
api tool
No Image

Quoting Kenton Varda

The big advantage of MCP over OpenAPI is that it is very clear about auth. [...] Maybe an agent could read the docs and write code to auth. But we …

Simon Willison's Blog
api security
No Image

Quoting Josh Cohenzadeh

I have AiDHD It has never been easier to build an MVP and in turn, it has never been harder to keep focus. When new features always feel like they're …

Simon Willison's Blog
api tool
No Image

Could LLMs encourage new programming languages?

My hunch is that existing LLMs make it easier to build a new programming language in a way that captures new developers. Most programming languages are similar enough to existing …

Simon Willison's Blog
library tool
No Image

Using Codex CLI with gpt-oss:120b on an NVIDIA DGX Spark via Tailscale

Inspired by a YouTube comment I wrote up how I run OpenAI's Codex CLI coding agent against the gpt-oss:120b model running in Ollama on my NVIDIA DGX Spark via a …

Simon Willison's Blog
tool
No Image

You should write an agent

Thomas Ptacek on the Fly blog: Agents are the most surprising programming experience I’ve had in my career. Not because I’m awed by the magnitude of their powers — I …

Simon Willison's Blog
platform
No Image

Quoting Ben Stolovitz

My trepidation extends to complex literature searches. I use LLMs as secondary librarians when I’m doing research. They reliably find primary sources (articles, papers, etc.) that I miss in my …

Simon Willison's Blog
platform
Kimi K2 Thinking

Kimi K2 Thinking

Chinese AI lab Moonshot's Kimi K2 established itself as one of the largest open weight models - 1 trillion parameters - back in July. They've now released the Thinking version, …

Simon Willison's Blog
platform
No Image

Quoting Nathan Lambert

At the start of the year, most people loosely following AI probably knew of 0 [Chinese] AI labs. Now, and towards wrapping up 2025, I’d say all of DeepSeek, Qwen, …

Simon Willison's Blog
platform
Code research projects with async coding agents like Claude Code and Codex

Code research projects with async coding agents like Claude Code and Codex

I’ve been experimenting with a pattern for LLM usage recently that’s working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and …

Simon Willison's Blog
api tool
No Image

Quoting @belligerentbarbies

I'm worried that they put co-pilot in Excel because Excel is the beast that drives our entire economy and do you know who has tamed that beast? Brenda. Who is …

Simon Willison's Blog
api tool
No Image

Code execution with MCP: Building more efficient agents

When I wrote about Claude Skills I mentioned that I don't use MCP at all any more when working with coding agents - I find CLI utilities and libraries like …

Simon Willison's Blog
api tool
No Image

MCP Colors: Systematically deal with prompt injection risk

Tim Kellogg proposes a neat way to think about prompt injection, especially with respect to MCP tools. Classify every tool with a color: red if it exposes the agent to …

Simon Willison's Blog
security
No Image

Quoting Steve Francia

Every time an engineer evaluates a language that isn’t “theirs,” their brain is literally working against them. They’re not just analyzing technical trade offs, they’re contemplating a version of themselves …

Simon Willison's Blog
tool
No Image

Quoting MiniMax

Interleaved thinking is essential for LLM agents: it means alternating between explicit reasoning and tool use, while carrying that reasoning forward between steps.This process significantly enhances planning, self‑correction, and reliability …

Simon Willison's Blog
api tool
New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend. Agents Rule of Two: A Practical Approach to AI Agent Security The first is …

Simon Willison's Blog
security
No Image

PyCon US 2026 call for proposals is now open

PyCon US is coming to the US west coast! 2026 and 2027 will both be held in Long Beach, California - the 2026 conference is set for May 13th-19th next …

Simon Willison's Blog
api tool
No Image

How I Use Every Claude Code Feature

Useful, detailed guide from Shrivu Shankar, a Claude Code power user. Lots of tips for both individual Claude Code usage and configuring it for larger team projects. I appreciated Shrivu's …

Simon Willison's Blog
tool
No Image

Claude Code Can Debug Low-level Cryptography

Go cryptography author Filippo Valsorda reports on some very positive results applying Claude Code to the challenge of implementing novel cryptography algorithms. After Claude was able to resolve a "fairly …

Simon Willison's Blog
security tool
No Image

October 2025 sponsors-only newsletter

I just hit send on the October edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy …

Simon Willison's Blog
api tool
No Image

Curiosity-driven blogging

My piece this morning about the Marimo acquisition is an example of a variant of a TIL - I didn't know much about CoreWeave, the acquiring company, so I poked …

Simon Willison's Blog
tool
No Image

CoreWeave adds Marimo to their 2025 acquisition spree

I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …

Simon Willison's Blog
library tool
No Image

Marimo is Joining CoreWeave

I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …

Simon Willison's Blog
library tool
No Image

Quoting François Chollet

To really understand a concept, you have to "invent" it yourself in some capacity. Understanding doesn't come from passive content consumption. It is always self-built. It is an active, high-agency, …

Simon Willison's Blog
tool
Introducing SWE-1.5: Our Fast Agent Model

Introducing SWE-1.5: Our Fast Agent Model

Here's the second fast coding model released by a coding agent IDE in the same day - the first was Composer-1 by Cursor. This time it's Windsurf releasing SWE-1.5: Today …

Simon Willison's Blog
api cloud tool
MiniMax M2 & Agent: Ingenious in Simplicity

MiniMax M2 & Agent: Ingenious in Simplicity

MiniMax M2 was released on Monday 27th October by MiniMax, a Chinese AI lab founded in December 2021. It's a very promising model. Their self-reported benchmark scores show it as …

Simon Willison's Blog
tool
Composer: Building a fast frontier model with RL

Composer: Building a fast frontier model with RL

Cursor released Cursor 2.0 today, with a refreshed UI focused on agentic coding (and running agents in parallel) and a new model that's unique to Cursor called Composer 1. As far …

Simon Willison's Blog
library tool
No Image

Quoting Aaron Boodman

Claude doesn't make me much faster on the work that I am an expert on. Maybe 15-20% depending on the day. It's the work that I don't know how to …

Simon Willison's Blog
platform
No Image

GenAI Image Editing Showdown

Useful collection of examples by Shaun Pedicini who tested Seedream 4, Gemini 2.5 Flash, Qwen-Image-Edit, FLUX.1 Kontext [dev], FLUX.1 Kontext [max], OmniGen2, and OpenAI gpt-image-1 across 12 image editing prompts. …

Simon Willison's Blog
tool
No Image

Sora might have a 'pervert' problem on its hands

Katie Notopoulos turned on the Sora 2 option where anyone can make a video featuring her cameo, and then: I found a stranger had made a video where I appeared …

Simon Willison's Blog
tool
No Image

Setting up a codebase for working with coding agents

Someone on Hacker News asked for tips on setting up a codebase to be more productive with AI coding tools. Here's my reply: Good automated tests which the coding agent …

Simon Willison's Blog
api library tool
No Image

Quoting Claude Docs

If you have an AGENTS.md file, you can source it in your CLAUDE.md using @AGENTS.md to maintain a single source of truth.

Simon Willison's Blog
tool
Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding

Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding

New model interpretability research from Anthropic, this time focused on SVG and ASCII art generation. We found that the same feature that activates over the eyes in an ASCII face …

Simon Willison's Blog
tool
claude_code_docs_map.md

claude_code_docs_map.md

Something I'm enjoying about Claude Code is that any time you ask it questions about itself it runs tool calls like these: In this case I'd asked it about its …

Simon Willison's Blog
api tool
No Image

Quoting Geoffrey Litt

A lot of people say AI will make us all "managers" or "editors"...but I think this is a dangerously incomplete view! Personally, I'm trying to code like a surgeon. A …

Simon Willison's Blog
api tool
No Image

OpenAI no longer has to preserve all of its ChatGPT data, with some exceptions

This is a relief: Federal judge Ona T. Wang filed a new order on October 9 that frees OpenAI of an obligation to "preserve and segregate all output log data …

Simon Willison's Blog
platform
No Image

Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas

My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post …

Simon Willison's Blog
api security tool
Living dangerously with Claude

Living dangerously with Claude

I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling …

Simon Willison's Blog
api tool
SLOCCount in WebAssembly

SLOCCount in WebAssembly

This project/side-quest got a little bit out of hand. I remembered an old tool called SLOCCount which could count lines of code and produce an estimate for how much they …

Simon Willison's Blog
tool
No Image

Don't let Claude Code delete your session logs

Claude Code stores full logs of your sessions as newline-delimited JSON in ~/.claude/projects/encoded-directory/*.jsonl on your machine. I currently have 379MB of these! Here's an example jsonl file which I extracted …

Simon Willison's Blog
tool
Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers

Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers

The Brave security team wrote about prompt injection against browser agents a few months ago (here are my notes on that). Here's their follow-up: What we’ve found confirms our initial …

Simon Willison's Blog
security
Introducing ChatGPT Atlas

Introducing ChatGPT Atlas

Last year OpenAI hired Chrome engineer Darin Fisher, which sparked speculation they might have their own browser in the pipeline. Today it arrived. ChatGPT Atlas is a Mac-only web browser …

Simon Willison's Blog
tool ui
No Image

Quoting Bruce Schneier and Barath Raghavan

Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include …

Simon Willison's Blog
security
Claude Code for web - a new asynchronous coding agent from Anthropic

Claude Code for web - a new asynchronous coding agent from Anthropic

Anthropic launched Claude Code for web this morning. It’s an asynchronous coding agent—their answer to OpenAI’s Codex Cloud and Google’s Jules, and has a very similar shape. I had preview …

Simon Willison's Blog
api tool
Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

DeepSeek released a new model yesterday: DeepSeek-OCR, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running …

Simon Willison's Blog
api tool
TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

I landed a PR by Manuel Solorzano adding pricing information to llm-prices.com for OpenAI's o4-mini-deep-research and o3-deep-research models, which they released in June and document here. I realized I'd never …

Simon Willison's Blog
api tool
No Image

The AI water issue is fake

Andy Masley (previously): All U.S. data centers (which mostly support the internet, not AI) used 200--250 million gallons of freshwater daily in 2023. The U.S. consumes approximately 132 billion gallons …

Simon Willison's Blog
api cloud infra
No Image

Andrej Karpathy — AGI is still a decade away

Extremely high signal 2 hour 25 minute (!) conversation between Andrej Karpathy and Dwarkesh Patel. It starts with Andrej's claim that "the year of agents" is actually more likely to …

Simon Willison's Blog
library tool
No Image

Quoting Alexander Fridriksson and Jay Miller

Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit …

Simon Willison's Blog
api database security
No Image

Quoting Barry Zhang

Skills actually came out of a prototype I built demonstrating that Claude Code is a general-purpose agent :-) It was a natural conclusion once we realized that bash + filesystem …

Simon Willison's Blog
platform
Claude Skills are awesome, maybe a bigger deal than MCP

Claude Skills are awesome, maybe a bigger deal than MCP

Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills …

Simon Willison's Blog
api tool
No Image

NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0

EXO Labs wired a 256GB M3 Ultra Mac Studio up to an NVIDIA DGX Spark and got a 2.8x performance boost serving Llama-3.1 8B (FP16) with an 8,192 token prompt. …

Simon Willison's Blog
framework tool
No Image

Quoting Riana Pfefferkorn

Pro se litigants account for the majority of the cases in the United States where a party submitted a court filing containing AI hallucinations. In a country where legal representation …

Simon Willison's Blog
platform
No Image

Coding without typing the code

Last year the most useful exercise for getting a feel for how good LLMs were at writing code was vibe coding (before that name had even been coined) - seeing …

Simon Willison's Blog
platform
No Image

Quoting Catherine Wu

While Sonnet 4.5 remains the default [in Claude Code], Haiku 4.5 now powers the Explore subagent which can rapidly gather context on your codebase to build apps even faster. You …

Simon Willison's Blog
platform
Introducing Claude Haiku 4.5

Introducing Claude Haiku 4.5

Anthropic released Claude Haiku 4.5 today, the cheapest member of the Claude 4.5 family that started with Sonnet 4.5 a couple of weeks ago. It's priced at $1/million input tokens …

Simon Willison's Blog
platform
No Image

Quoting Claude Haiku 4.5 System Card

Previous system cards have reported results on an expanded version of our earlier agentic misalignment evaluation suite: three families of exotic scenarios meant to elicit the model to commit blackmail, …

Simon Willison's Blog
platform
NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA sent me a preview unit of their new DGX Spark desktop “AI supercomputer”. I’ve never had hardware to review before! You can consider this my first ever sponsored post …

Simon Willison's Blog
cloud tool
No Image

Just Talk To It - the no-bs Way of Agentic Engineering

Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions …

Simon Willison's Blog
api tool
No Image

nanochat

Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …

Simon Willison's Blog
tool
No Image

Claude Code sub-agents

Claude Code includes the ability to run sub-agents, where a separate agent loop with a fresh token context is dispatched to achieve a goal and report back when it's done. …

Simon Willison's Blog
api tool
No Image

Vibing a Non-Trivial Ghostty Feature

Mitchell Hashimoto provides a comprehensive answer to the frequent demand for a detailed description of shipping a non-trivial production feature to an existing project using AI-assistance. In this case it's …

Simon Willison's Blog
api library tool
No Image

Note on 11th October 2025

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code …

Simon Willison's Blog
platform
No Image

simonw/claude-skills

One of the tips I picked up from Jesse Vincent's Claude Code Superpowers post (previously) was this: Skills are what give your agents Superpowers. The first time they really popped …

Simon Willison's Blog
api tool
Superpowers: How I'm using coding agents in October 2025

Superpowers: How I'm using coding agents in October 2025

A follow-up to Jesse Vincent's post about September, but this is a really significant piece in its own right. Jesse is one of the most creative users of coding agents …

Simon Willison's Blog
api tool
No Image

A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises

Filippo Valsorda surveyed 18 incidents from the past year of open source supply chain attacks, where package updates were infected with malware thanks to a compromise of the project itself. …

Simon Willison's Blog
security
No Image

Video of GPT-OSS 20B running on a phone

GPT-OSS 20B is a very good model. At launch OpenAI claimed: The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with …

Simon Willison's Blog
tool
No Image

Quoting Gergely Orosz

I get a feeling that working with multiple AI agents is something that comes VERY natural to most senior+ engineers or tech lead who worked at a large company You …

Simon Willison's Blog
platform
No Image

Claude can write complete Datasette plugins now

This isn’t necessarily surprising, but it’s worth noting anyway. Claude Sonnet 4.5 is capable of building a full Datasette plugin now. I’ve seen models complete aspects of this in the …

Simon Willison's Blog
api tool
No Image

Quoting Simon Højberg

The cognitive debt of LLM-laden coding extends beyond disengagement of our craft. We’ve all heard the stories. Hyped up, vibed up, slop-jockeys with attention spans shorter than the framework-hopping JavaScript …

Simon Willison's Blog
platform
Gemini 2.5 Computer Use can solve Google's own CAPTCHAs

Gemini 2.5 Computer Use can solve Google's own CAPTCHAs

Google just introduced a new Gemini 2.5 Computer Use model, specially designed to help operate a GUI interface by interacting with visible elements using a virtual mouse and keyboard. I …

Simon Willison's Blog
framework tool
No Image

Vibe engineering

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to …

Simon Willison's Blog
library tool
No Image

Deloitte to pay money back to Albanese government after using AI in $440,000 report

Ouch: Deloitte will provide a partial refund to the federal government over a $440,000 report that contained several errors, after admitting it used generative artificial intelligence to help produce it. …

Simon Willison's Blog
platform
No Image

a system that can do work independently on behalf of the user

I've settled on agents as meaning "LLMs calling tools in a loop to achieve a goal" but OpenAI continue to muddy the waters with much more vague definitions. Swyx spotted …

Simon Willison's Blog
platform
gpt-image-1-mini

gpt-image-1-mini

OpenAI released a new image model today: gpt-image-1-mini, which they describe as "A smaller image generation model that’s 80% less expensive than the large model." They released it very quietly …

Simon Willison's Blog
api tool
GPT-5 pro

GPT-5 pro

Here's OpenAI's model documentation for their GPT-5 pro model, released to their API today at their DevDay event. It has similar base characteristics to GPT-5: both share a September 30, …

Simon Willison's Blog
api
No Image

OpenAI DevDay 2025 live blog

I’m at OpenAI DevDay in Fort Mason, San Francisco today. As I did last year, I’m going to be live blogging the announcements from the kenote. Unlike last year, this …

Simon Willison's Blog
platform
No Image

Embracing the parallel coding agent lifestyle

For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in …

Simon Willison's Blog
tool
Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines

Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines

I've had trouble getting my head around DSPy in the past. This half hour talk by Drew Breunig at the recent Databricks Data + AI Summit is the clearest explanation …

Simon Willison's Blog
platform
No Image

Sora 2 prompt injection

It turns out Sora 2 is vulnerable to prompt injection! When you onboard to Sora you get the option to create your own "cameo" - a virtual video recreation of …

Simon Willison's Blog
security
Daniel Stenberg's note on AI assisted curl bug reports

Daniel Stenberg's note on AI assisted curl bug reports

Curl maintainer Daniel Stenberg on Mastodon: Joshua Rogers sent us a massive list of potential issues in #curl that he found using his set of AI assisted tools. Code analyzer …

Simon Willison's Blog
api tool
No Image

Quoting Nadia Eghbal

When attention is being appropriated, producers need to weigh the costs and benefits of the transaction. To assess whether the appropriation of attention is net-positive, it’s useful to distinguish between …

Simon Willison's Blog
api tool
aavetis/PRarena

aavetis/PRarena

Albert Avetisian runs this repository on GitHub which uses the Github Search API to track the number of PRs that can be credited to a collection of different coding agents. …

Simon Willison's Blog
api cloud tool
Two more Chinese pelicans

Two more Chinese pelicans

Two new models from Chinese AI labs in the past few days. I tried them both out using llm-openrouter: DeepSeek-V3.2-Exp from DeepSeek. Announcement, Tech Report, Hugging Face (690GB, MIT license). …

Simon Willison's Blog
platform
No Image

September monthly sponsors newsletter

I just sent out the September edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy here. …

Simon Willison's Blog
api tool
No Image

Sora 2

Having watched this morning's Sora 2 introduction video, the most notable feature (aside from audio generation - original Sora was silent, Google's Veo 3 supported audio in May 2025) looks …

Simon Willison's Blog
tool
No Image

Designing agentic loops

Coding agents like Anthropic’s Claude Code and OpenAI’s Codex CLI represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly …

Simon Willison's Blog
api tool
Claude Sonnet 4.5 is probably the "best coding model in the world" (at least for now)

Claude Sonnet 4.5 is probably the "best coding model in the world" (at least for now)

Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims: Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building …

Simon Willison's Blog
api tool
No Image

Armin Ronacher: 90%

The idea of AI writing "90% of the code" to-date has mostly been expressed by people who sell AI tooling. Over the last few months, I've increasingly seen the same …

Simon Willison's Blog
api tool
No Image

Quoting Scott Aaronson

Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked …

Simon Willison's Blog
platform
No Image

Quoting Nick Turley

We’ve seen the strong reactions to 4o responses and want to explain what is happening. We’ve started testing a new safety routing system in ChatGPT. As we previously mentioned, when …

Simon Willison's Blog
platform
Video models are zero-shot learners and reasoners

Video models are zero-shot learners and reasoners

Fascinating new paper from Google DeepMind which makes a very convincing case that their Veo 3 model - and generative video models in general - serve a similar role in …

Simon Willison's Blog
tool
No Image

ForcedLeak: AI Agent risks exposed in Salesforce AgentForce

Classic lethal trifecta image exfiltration bug reported against Salesforce AgentForce by Sasi Levi and Noma Security. Here the malicious instructions come in via the Salesforce Web-to-Lead feature. When a Salesforce …

Simon Willison's Blog
api cloud security
No Image

How to stop AI’s “lethal trifecta”

This is the second mention of the lethal trifecta in the Economist in just the last week! Their earlier coverage was Why AI systems may never be secure on September …

Simon Willison's Blog
security
No Image

GitHub Copilot CLI is now in public preview

GitHub now have their own entry in the coding terminal CLI agent space: Copilot CLI. It's the same basic shape as Claude Code, Codex CLI, Gemini CLI and a growing …

Simon Willison's Blog
api tool
Improved Gemini 2.5 Flash and Flash-Lite

Improved Gemini 2.5 Flash and Flash-Lite

Two new preview models from Google - updates to their fast and inexpensive Flash and Flash Lite families: The latest version of Gemini 2.5 Flash-Lite was trained and built based …

Simon Willison's Blog
api tool
No Image

Don't hide your best documentation

If you hide the system prompt and tool descriptions for your LLM agent, what you're actually doing is deliberately hiding the most useful documentation describing your service from your most …

Simon Willison's Blog
platform
No Image

Quoting Stanford CS221 Autumn 2025

[2 points] Learn basic NumPy operations with an AI tutor! Use an AI chatbot (e.g., ChatGPT, Claude, Gemini, or Stanford AI Playground) to teach yourself how to do basic vector …

Simon Willison's Blog
tool
No Image

Cross-Agent Privilege Escalation: When Agents Free Each Other

Here's a clever new form of AI exploit from Johann Rehberger, who has coined the term Cross-Agent Privilege Escalation to describe an attack where multiple coding agents - GitHub Copilot …

Simon Willison's Blog
security
GPT-5-Codex

GPT-5-Codex

OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed …

Simon Willison's Blog
api library tool
No Image

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action

I've been looking forward to this. Qwen 2.5 VL is one of the best available open weight vision LLMs, so I had high hopes for Qwen 3's vision models. Firstly, …

Simon Willison's Blog
platform
No Image

Why AI systems might never be secure

The Economist have a new piece out about LLM security, with this headline and subtitle: Why AI systems might never be secure A “lethal trifecta” of conditions opens them to …

Simon Willison's Blog
security
No Image

Quoting Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano and Jeffrey T. Hancock

We define workslop as AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task. Here’s how this happens. As AI tools …

Simon Willison's Blog
tool
Four new releases from Qwen

Four new releases from Qwen

It's been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements): Qwen3-Next-80B-A3B-Instruct-FP8 and …

Simon Willison's Blog
library tool
CompileBench: Can AI Compile 22-year-old Code?

CompileBench: Can AI Compile 22-year-old Code?

Interesting new LLM benchmark from Piotr Grabowski and Piotr Migdał: how well can different models handle compilation challenges such as cross-compiling gucr for ARM64 architecture? This is one of my …

Simon Willison's Blog
api tool
No Image

ChatGPT Is Blowing Up Marriages as Spouses Use AI to Attack Their Partners

Maggie Harrison Dupré for Futurism. It turns out having an always-available "marriage therapist" with a sycophantic instinct to always take your side is catastrophic for relationships. The tension in the …

Simon Willison's Blog
platform
No Image

Locally AI

Handy new iOS app by Adrien Grondin for running local LLMs on your phone. It just added support for the new iOS 26 Apple Foundation model, so you can install …

Simon Willison's Blog
mobile
No Image

llm-openrouter 0.5

New release of my LLM plugin for accessing models made available via OpenRouter. The release notes in full: Support for tool calling. Thanks, James Sanford. #43 Support for reasoning options, …

Simon Willison's Blog
api tool
Grok 4 Fast

Grok 4 Fast

New hosted vision-enabled reasoning model from xAI that's designed to be fast and extremely competitive on price. It has a 2 million token context window and "was trained end-to-end with …

Simon Willison's Blog
tool
No Image

Magistral 1.2

Mistral quietly released two new models yesterday: Magistral Small 1.2 (Apache 2.0, 96.1 GB on Hugging Face) and Magistral Medium 1.2 (not open weights same as Mistral's other "medium" models.) …

Simon Willison's Blog
platform
No Image

The Hidden Risk in Notion 3.0 AI Agents: Web Search Tool Abuse for Data Exfiltration

Abi Raghuram reports that Notion 3.0, released yesterday, introduces new prompt injection data exfiltration vulnerabilities thanks to enabling lethal trifecta attacks. Abi's attack involves a PDF with hidden text (white …

Simon Willison's Blog
security
No Image

Quoting Steve Jobs

Well, the types of computers we have today are tools. They’re responders: you ask a computer to do something and it will do it. The next stage is going to …

Simon Willison's Blog
tool
I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now

I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now

I’ve noticed something interesting over the past few weeks: I’ve started using the term “agent” in conversations where I don’t feel the need to then define it, roll my eyes …

Simon Willison's Blog
platform
No Image

Anthropic: A postmortem of three recent issues

Anthropic had a very bad month in terms of model reliability: Between August and early September, three infrastructure bugs intermittently degraded Claude's response quality. We've now resolved these issues and …

Simon Willison's Blog
platform
No Image

ICPC medals for OpenAI and Gemini

In July it was the International Math Olympiad (OpenAI, Gemini), today it's the International Collegiate Programming Contest (ICPC). Once again, both OpenAI and Gemini competed with models that achieved Gold …

Simon Willison's Blog
platform
No Image

Announcing the 2025 PSF Board Election Results!

I'm happy to share that I've been re-elected for second term on the board of directors of the Python Software Foundation. Jannis Leidel was also re-elected and Abigail Dogbe and …

Simon Willison's Blog
tool
GPT‑5-Codex and upgrades to Codex

GPT‑5-Codex and upgrades to Codex

OpenAI half-released a new model today: GPT‑5-Codex, a fine-tuned GPT-5 variant explicitly designed for their various AI-assisted programming tools. I say half-released because it's not yet available via their API, …

Simon Willison's Blog
api library tool
No Image

Models can prompt now

Here's an interesting example of models incrementally improving over time: I am finding that today's leading models are competent at writing prompts for themselves and each other. A year ago …

Simon Willison's Blog
platform
No Image

gpt-5 and gpt-5-mini rate limit updates

OpenAI have increased the rate limits for their two main GPT-5 models. These look significant: gpt-5 Tier 1: 30K → 500K TPM (1.5M batch) Tier 2: 450K → 1M (3M …

Simon Willison's Blog
api
No Image

Quoting Matt Webb

The trick with Claude Code is to give it large, but not too large, extremely well defined problems. (If the problems are too large then you are now vibe coding… …

Simon Willison's Blog
platform
No Image

Comparing the memory implementations of Claude and ChatGPT

Shlok Khemani has been doing excellent work reverse-engineering LLM systems and documenting his discoveries. Last week he wrote about ChatGPT memory. This week it's Claude. Claude's memory system has two …

Simon Willison's Blog
api tool
Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!

Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!

Qwen announced two new models via their Twitter account (nothing on their blog yet): Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking. They make some big claims on performance: Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship. Qwen3-Next-80B-A3B-Thinking …

Simon Willison's Blog
tool
No Image

Defeating Nondeterminism in LLM Inference

A very common question I see about LLMs concerns why they can't be made to deliver the same response to the same prompt by setting a fixed random number seed. …

Simon Willison's Blog
library tool
No Image

Claude API: Web fetch tool

New in the Claude API: if you pass the web-fetch-2025-09-10 beta header you can add {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5} to your "tools" list and Claude will gain the …

Simon Willison's Blog
api tool
No Image

I Replaced Animal Crossing's Dialogue with a Live LLM by Hacking GameCube Memory

Brilliant retro-gaming project by Josh Fonseca, who figured out how to run 2002 Game Cube Animal Crossing in the Dolphin Emulator such that dialog with the characters was instead generated …

Simon Willison's Blog
api tool
No Image

Quoting Apple Security Engineering and Architecture

There has never been a successful, widespread malware attack against iPhone. The only system-level iOS attacks we observe in the wild come from mercenary spyware, which is vastly more complex …

Simon Willison's Blog
security
My review of Claude's new Code Interpreter, released under a very confusing name

My review of Claude's new Code Interpreter, released under a very confusing name

Today on the Anthropic blog: Claude can now create and edit files: Claude can now create and edit Excel spreadsheets, documents, PowerPoint slide decks, and PDFs directly in Claude.ai and …

Simon Willison's Blog
api tool
No Image

The 2025 PSF Board Election is Open!

The Python Software Foundation's annual board member election is taking place right now, with votes (from previously affirmed voting members) accepted from September 2nd, 2:00 pm UTC through Tuesday, September …

Simon Willison's Blog
api cloud platform
No Image

Geoffrey Huntley is cursed

Geoffrey Huntley vibe-coded an entirely new programming language using Claude: The programming language is called "cursed". It's cursed in its lexical structure, it's cursed in how it was built, it's …

Simon Willison's Blog
api library tool
Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Apollo Global Management’s “Chief Economist” Dr. Torsten Sløk released this interesting chart which appears to show a slowdown in AI adoption rates among large (>250 empoloyees) companies: Here’s the full …

Simon Willison's Blog
api library tool
No Image

Anthropic status: Model output quality

Anthropic previously reported model serving bugs that affected Claude Opus 4 and 4.1 for 56.5 hours. They've now fixed additional bugs affecting "a small percentage" of Sonnet 4 requests for …

Simon Willison's Blog
platform
No Image

Quoting TheSoftwareGuy

Having worked inside AWS I can tell you one big reason [that they don't document their internals] is the attitude/fear that anything we put in out public docs may end …

Simon Willison's Blog
cloud
No Image

Load Llama-3.2 WebGPU in your browser from a local folder

Inspired by a comment on Hacker News I decided to see if it was possible to modify the transformers.js-examples/tree/main/llama-3.2-webgpu Llama 3.2 chat demo (online here, I wrote about it last …

Simon Willison's Blog
tool
No Image

Quoting James Luan

I recently spoke with the CTO of a popular AI note-taking app who told me something surprising: they spend twice as much on vector search as they do on OpenAI …

Simon Willison's Blog
api
No Image

Is the LLM response wrong, or have you just failed to iterate it?

More from Mike Caulfield (see also the SIFT method). He starts with a fantastic example of Google's AI mode usually correctly handling a common piece of misinformation but occasionally falling …

Simon Willison's Blog
platform
No Image

Quoting Anil Dash

I agree with the intellectual substance of virtually every common critique of AI. And it's very clear that turning those critiques into a competition about who can frame them in …

Simon Willison's Blog
platform
No Image

The SIFT method

The SIFT method is "an evaluation strategy developed by digital literacy expert, Mike Caulfield, to help determine whether online content can be trusted for credible or reliable sources of information." …

Simon Willison's Blog
tool
AI mode is good, actually

AI mode is good, actually

When I wrote about how good ChatGPT with GPT-5 is at search yesterday I nearly added a note about how comparatively disappointing Google's efforts around this are. I'm glad I …

Simon Willison's Blog
api cloud tool
GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search

GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search

“Don’t use chatbots as search engines” was great advice for several years... until it wasn’t. I wrote about how good OpenAI’s o3 was at using its Bing-backed search tool back …

Simon Willison's Blog
api tool
No Image

Quoting Jason Liu

I am once again shocked at how much better image retrieval performance you can get if you embed highly opinionated summaries of an image, a summary that came out of …

Simon Willison's Blog
api
Kimi-K2-Instruct-0905

Kimi-K2-Instruct-0905

New not-quite-MIT licensed model from Chinese Moonshot AI, a follow-up to the highly regarded Kimi-K2 model they released in July. This one is an incremental improvement - I've seen it …

Simon Willison's Blog
library tool
No Image

Anthropic to pay $1.5 billion to authors in landmark AI settlement

I wrote about the details of this case when it was found that Anthropic's training on book content was fair use, but they needed to have purchased individual copies of …

Simon Willison's Blog
platform
Introducing EmbeddingGemma

Introducing EmbeddingGemma

Brand new open weights (under the slightly janky Gemma license) 308M parameter embedding model from Google: Based on the Gemma 3 architecture, EmbeddingGemma is trained on 100+ languages and is …

Simon Willison's Blog
library tool
No Image

Highlighted tools

Any time I share my collection of tools built using vibe coding and AI-assisted development (now at 124, here's the definitive list) someone will inevitably complain that they're mostly trivial. …

Simon Willison's Blog
tool
Beyond Vibe Coding

Beyond Vibe Coding

Back in May I wrote Two publishers and three authors fail to understand what “vibe coding” means where I called out the authors of two forthcoming books on "vibe coding" …

Simon Willison's Blog
tool
No Image

gov.uscourts.dcd.223205.1436.0_1.pdf

Here's the 230 page PDF ruling on the 2023 United States v. Google LLC federal antitrust case - the case that could have resulted in Google selling off Chrome and …

Simon Willison's Blog
api cloud tool
Rich Pixels

Rich Pixels

Neat Python library by Darren Burns adding pixel image support to the Rich terminal library, using tricks to render an image using full or half-height colored blocks. Here's the key …

Simon Willison's Blog
library tool
No Image

August 2025 newsletter

I just sent out my August 2025 sponsors-only newsletter summarizing the past month in LLMs and my other work. Topics included GPT-5, gpt-oss, image editing models (Qwen-Image-Edit and Gemini Nano …

Simon Willison's Blog
platform
No Image

Introducing gpt-realtime

Released a few days ago (August 28th), gpt-realtime is OpenAI's new "most advanced speech-to-speech model". It looks like this is a replacement for the older gpt-4o-realtime-preview model that was released …

Simon Willison's Blog
platform
Cloudflare Radar: AI Insights

Cloudflare Radar: AI Insights

Cloudflare launched this dashboard back in February, incorporating traffic analysis from Cloudflare's network along with insights from their popular 1.1.1.1 DNS service. I found this chart particularly interesting, showing which …

Simon Willison's Blog
cloud
No Image

Claude Opus 4.1 and Opus 4 degraded quality

Notable because often when people complain of degraded model quality it turns out to be unfounded - Anthropic in the past have emphasized that they don't change the model weights …

Simon Willison's Blog
platform
No Image

Quoting Benj Edwards

LLMs are intelligence without agency—what we might call "vox sine persona": voice without person. Not the voice of someone, not even the collective voice of many someones, but a voice …

Simon Willison's Blog
platform
The perils of vibe coding

The perils of vibe coding

I was interviewed by Elaine Moore for this opinion piece in the Financial Times, which ended up in the print edition of the paper too! I picked up a copy …

Simon Willison's Blog
api tool
No Image

Lossy encyclopedia

Since I love collecting questionable analogies for LLMs, here's a new one I just came up with: an LLM is a lossy encyclopedia. They have a huge array of facts …

Simon Willison's Blog
platform
No Image

Python: The Documentary

New documentary about the origins of the Python programming language - 84 minutes long, built around extensive interviews with Guido van Rossum and others who were there at the start …

Simon Willison's Blog
youtube
No Image

Quoting Bruce Schneier

We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks. Any AI that is working in an adversarial environment—and …

Simon Willison's Blog
security
No Image

Piloting Claude for Chrome

Two days ago I said: I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely. Today Anthropic announced their own …

Simon Willison's Blog
api security tool
No Image

Will Smith’s concert crowds are real, but AI is blurring the lines

Great piece from Andy Baio demonstrating quite how convoluted the usage ethics and backlash against generative AI has become. Will Smith has been accused of using AI to misleadingly inflate …

Simon Willison's Blog
platform
No Image

Agentic Browser Security: Indirect Prompt Injection in Perplexity Comet

The security team from Brave took a look at Comet, the LLM-powered "agentic browser" extension from Perplexity, and unsurprisingly found security holes you can drive a truck through. The vulnerability …

Simon Willison's Blog
api security tool
No Image

ChatGPT release notes: Project-only memory

The feature I've most wanted from ChatGPT's memory feature (the newer version of memory that automatically includes relevant details from summarized prior conversations) just landed: With project-only memory enabled, ChatGPT …

Simon Willison's Blog
platform
DeepSeek 3.1

DeepSeek 3.1

The latest model from DeepSeek, a 685B monster (like DeepSeek v3 before it) but this time it's a hybrid reasoning model. DeepSeek claim: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, …

Simon Willison's Blog
platform
No Image

Quoting The Bluesky Team

Mississippi's approach would fundamentally change how users access Bluesky. The Supreme Court’s recent decision leaves us facing a hard reality: comply with Mississippi’s age assurance law—and make every Mississippi Bluesky …

Simon Willison's Blog
security
No Image

too many model context protocol servers and LLM allocations on the dance floor

Useful reminder from Geoffrey Huntley of the infrequently discussed significant token cost of using MCP. Geoffrey estimate estimates that the usable context window something like Amp or Cursor is around …

Simon Willison's Blog
api tool
No Image

Quoting potatolicious

Most classical engineering fields deal with probabilistic system components all of the time. In fact I'd go as far as to say that inability to deal with probabilistic components is …

Simon Willison's Blog
platform
No Image

Quoting Matt Garman

I was at a leadership group and people were telling me "We think that with AI we can replace all of our junior people in our company." I was like, …

Simon Willison's Blog
platform
No Image

Quoting Mustafa Suleyman

Simply put, my central worry is that many people will start to believe in the illusion of AIs as conscious entities so strongly that they’ll soon advocate for AI rights, …

Simon Willison's Blog
platform
No Image

Quoting u/AssafMalkiIL

what’s the point of vibe coding if at the end of the day i still gotta pay a dev to look at the code anyway. sure it feels kinda cool …

Simon Willison's Blog
tool
David Ho on BlueSky: A pelican tried to eat my bike

David Ho on BlueSky: A pelican tried to eat my bike

David Ho caught video footage of one of the pelicans in St James's Park expressing deep curiosity in his bicycle. I think it wants to ride it.

Simon Willison's Blog
tool
Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency

Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency

As promised in their August 4th release of the Qwen image generation model, Qwen have now followed it up with a separate model, Qwen-Image-Edit, which can take an image and …

Simon Willison's Blog
tool
llama.cpp guide: running gpt-oss with llama.cpp

llama.cpp guide: running gpt-oss with llama.cpp

Really useful official guide to running the OpenAI gpt-oss models using llama-server from llama.cpp - which provides an OpenAI-compatible localhost API and a neat web interface for interacting with the …

Simon Willison's Blog
tool
No Image

PyPI: Preventing Domain Resurrection Attacks

Domain resurrection attacks are a nasty vulnerability in systems that use email verification to allow people to recover their accounts. If somebody lets their domain name expire an attacker might …

Simon Willison's Blog
api security
No Image

r/ChatGPTPro: What is the most profitable thing you have done with ChatGPT?

This Reddit thread - with 279 replies - offers a neat targeted insight into the kinds of things people are using ChatGPT for. Lots of variety here but two themes …

Simon Willison's Blog
platform
No Image

Google Gemini URL Context

New feature in the Gemini API: you can now enable a url_context tool which the models can use to request the contents of URLs as part of replying to a …

Simon Willison's Blog
api tool
No Image

TIL: Running a gpt-oss eval suite against LM Studio on a Mac

The other day I learned that OpenAI published a set of evals as part of their gpt-oss model release, described in their cookbook on Verifying gpt-oss implementations. I decided to …

Simon Willison's Blog
tool
No Image

Quoting Sam Altman

Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.

Simon Willison's Blog
platform
No Image

GPT-5 has a hidden system prompt

It looks like GPT-5 when accessed via the OpenAI API may have its own hidden system prompt, independent from the system prompt you can specify in an API call. At …

Simon Willison's Blog
api
The Summer of Johann: prompt injections as far as the eye can see

The Summer of Johann: prompt injections as far as the eye can see

Independent AI researcher Johann Rehberger (previously) has had an absurdly busy August. Under the heading The Month of AI Bugs he has been publishing one report per day across an …

Simon Willison's Blog
api security tool
No Image

Meta’s AI rules have let bots hold ‘sensual’ chats with kids, offer false medical info

This is grim. Reuters got hold of a leaked copy Meta's internal "GenAI: Content Risk Standards" document: Running to more than 200 pages, the document defines what Meta staff and …

Simon Willison's Blog
security
Open weight LLMs exhibit inconsistent performance across providers

Open weight LLMs exhibit inconsistent performance across providers

Artificial Analysis published a new benchmark the other day, this time focusing on how an individual model—OpenAI’s gpt-oss-120b—performs across different hosted providers. The results showed some surprising differences. Here’s the …

Simon Willison's Blog
api cloud tool
No Image

Quoting Steve Wozniak

I gave all my Apple wealth away because wealth and power are not what I live for. I have a lot of fun and happiness. I funded a lot of …

Simon Willison's Blog
platform
No Image

Quoting Cory Doctorow

NERD HARDER! is the answer every time a politician gets a technological idée-fixe about how to solve a social problem by creating a technology that can't exist. It's the answer …

Simon Willison's Blog
security
No Image

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

New from Google: Gemma 3 270M, a compact, 270-million parameter model designed from the ground up for task-specific fine-tuning with strong instruction-following and text structuring capabilities already trained in. This …

Simon Willison's Blog
framework tool
No Image

Screaming in the Cloud: AI’s Security Crisis: Why Your Assistant Might Betray You

I recorded this podcast conversation with Corey Quinn a few weeks ago: On this episode of Screaming in the Cloud, Corey Quinn talks with Simon Willison, founder of Datasette and …

Simon Willison's Blog
api security tool
How Does A Blind Model See The Earth?

How Does A Blind Model See The Earth?

Fun, creative new micro-eval. Split the world into a sampled collection of latitude longitude points and for each one ask a model: If this location is over land, say 'Land'. …

Simon Willison's Blog
platform
simonw/codespaces-llm

simonw/codespaces-llm

GitHub Codespaces provides full development environments in your browser, and is free to use with anyone with a GitHub account. Each environment has a full Linux container and a browser-based …

Simon Willison's Blog
api tool
No Image

Claude Sonnet 4 now supports 1M tokens of context

Gemini and OpenAI both have million token models, so it's good to see Anthropic catching up. This is 5x the previous 200,000 context length limit of the various Claude Sonnet …

Simon Willison's Blog
api tool
No Image

Quoting Nick Turley

I think there's been a lot of decisions over time that proved pretty consequential, but we made them very quickly as we have to. [...] [On pricing] I had this …

Simon Willison's Blog
api tool
No Image

LLM 0.27, the annotated release notes: GPT-5 and improved tool calling

I shipped LLM 0.27 today, adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to the tool calling features introduced in LLM 0.26. …

Simon Willison's Blog
api library tool
No Image

Reddit will block the Internet Archive

Well this sucks. Jay Peters for the Verge: Reddit says that it has caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to start …

Simon Willison's Blog
api security
No Image

Codex upgrade

If you've been experimenting with OpenAI's Codex CLI and have been frustrated that it's not possible to select text and copy it to the clipboard, at least when running in …

Simon Willison's Blog
library tool
qwen-image-mps

qwen-image-mps

Ivan Fioravanti built this Python CLI script for running the Qwen/Qwen-Image image generation model on an Apple silicon Mac, optionally using the Qwen-Image-Lightning LoRA to dramatically speed up generation. Ivan …

Simon Willison's Blog
tool