Simon Willison's Blog

Simon Willison's Blog

simonwillison.net/
70
Articles
7月16日 04:01
Last updated
No Image

Reflections on OpenAI

Calvin French-Owen spent just over a year working at OpenAI, during which time the organization grew from 1,000 to 3,000 people and Calvin found himself in "the top 30% by …

Simon Willison's Blog
api library tool
No Image

xAI: "We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated"

They continue: One was that if you ask it "What is your surname?" it doesn't have one so it searches the internet leading to undesirable results, such as when its …

Simon Willison's Blog
tool
Application development without programmers

Application development without programmers

This book by James Martin, published in 1982, includes the following in the preface: Applications development did not change much for 20 years, but now a new wave is crashing …

Simon Willison's Blog
api framework tool
No Image

ccusage

Claude Code logs detailed usage information to the ~/.claude/ directory. ccusage is a neat little Node.js tool which reads that information and shows you a readable summary of your usage …

Simon Willison's Blog
tool
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

METR - for Model Evaluation & Threat Research - are a non-profit research institute founded by Beth Barnes, a former alignment researcher at OpenAI (see Wikipedia). They've previously contributed to …

Simon Willison's Blog
tool
Grok 4 Heavy won't reveal its system prompt

Grok 4 Heavy won't reveal its system prompt

Grok 4 Heavy is the "think much harder" version of Grok 4 that's currenly only available on their $300/month plan. Jeremy Howard relays a report from a Grok 4 Heavy …

Simon Willison's Blog
platform
No Image

Quoting @grok

On the morning of July 8, 2025, we observed undesired responses and immediately began investigating. To identify the specific language in the instructions causing the undesired behavior, we conducted multiple …

Simon Willison's Blog
platform
No Image

Musk’s latest Grok chatbot searches for billionaire mogul’s views before answering questions

I got quoted a couple of times in this story about Grok searching for tweets from:elonmusk by Matt O’Brien for the Associated Press. “It’s extraordinary,” said Simon Willison, an independent …

Simon Willison's Blog
tool
moonshotai/Kimi-K2-Instruct

moonshotai/Kimi-K2-Instruct

Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the …

Simon Willison's Blog
tool
No Image

Quoting Django’s security policies

Following the widespread availability of large language models (LLMs), the Django Security Team has received a growing number of security reports generated partially or entirely using such tools. Many of …

Simon Willison's Blog
security
No Image

Generationship: Ep. #39, Simon Willison

I recorded this podcast episode with Rachel Chalmers a few weeks ago. We talked about the resurgence of blogging, the legacy of Google Reader, learning in public, LLMs as weirdly …

Simon Willison's Blog
podcast
Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"

Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk’s stance before providing you with an answer. …

Simon Willison's Blog
platform
Grok 4

Grok 4

Released last night, Grok 4 is now available via both API and a paid subscription for end-users. Key characteristics: image and text input, text output. 256,000 context length (twice that …

Simon Willison's Blog
api tool
Infinite Monkey

Infinite Monkey

Mihai Parparita's Infinite Mac lets you run classic MacOS emulators directly in your browser. Infinite Monkey is a new feature which taps into the OpenAI Computer Use and Claude Computer …

Simon Willison's Blog
tool
Quoting Aphyr

Quoting Aphyr

I strongly suspect that Market Research Future, or a subcontractor, is conducting an automated spam campaign which uses a Large Language Model to evaluate a Mastodon instance, submit a plausible …

Simon Willison's Blog
platform
Become a command-line superhero with Simon Willison's llm tool

Become a command-line superhero with Simon Willison's llm tool

Christopher Smith ran a mini hackathon in Albany New York at the weekend around uses of my LLM - the first in-person event I'm aware of dedicated to that project! …

Simon Willison's Blog
api tool
Adding a feature because ChatGPT incorrectly thinks it exists

Adding a feature because ChatGPT incorrectly thinks it exists

Adrian Holovaty describes how his SoundSlice service saw an uptick in users attempting to use their sheet music scanner to import ASCII-art guitar tab... because it turned out ChatGPT had …

Simon Willison's Blog
api tool
I Shipped a macOS App Built Entirely by Claude Code

I Shipped a macOS App Built Entirely by Claude Code

Indragie Karunaratne has "been building software for the Mac since 2008", but recently decided to try Claude Code to build a side project: Context, a native Mac app for debugging …

Simon Willison's Blog
library tool
Quoting Nineteen Eighty-Four

Quoting Nineteen Eighty-Four

There was a whole chain of separate departments dealing with proletarian literature, music, drama, and entertainment generally. Here were produced rubbishy newspapers containing almost nothing except sport, crime and astrology, …

Simon Willison's Blog
platform
No Image

Supabase MCP can leak your entire SQL database

Here's yet another example of a lethal trifecta attack, where an LLM system combines access to private data, exposure to potentially malicious instructions and a mechanism to communicate data back …

Simon Willison's Blog
database security
No Image

Cursor: Clarifying Our Pricing

Cursor changed their pricing plan on June 16th, introducing a new $200/month Ultra plan with "20x more usage than Pro" and switching their $20/month Pro plan from "request limits to …

Simon Willison's Blog
api tool
No Image

Identify, solve, verify

The more time I spend using LLMs for code, the less I worry for my career - even as their coding capabilities continue to improve. Using LLMs as part of …

Simon Willison's Blog
platform
No Image

awwaiid/gremllm

Delightfully cursed Python library by Brock Wilcox, built on top of LLM: from gremllm import Gremllm counter = Gremllm("counter") counter.value = 5 counter.increment() print(counter.value) # 6? print(counter.to_roman_numerals()) # VI? You …

Simon Willison's Blog
library tool
No Image

Quoting Adam Gordon Bell

I think that a lot of resistance to AI coding tools comes from the same place: fear of losing something that has defined you for so long. People are reacting …

Simon Willison's Blog
platform
No Image

Frequently Asked Questions (And Answers) About AI Evals

Hamel Husain and Shreya Shankar have been running a paid, cohort-based course on AI Evals For Engineers & PMs over the past few months. Here Hamel collects answers to the …

Simon Willison's Blog
platform
No Image

Trial Court Decides Case Based On AI-Hallucinated Caselaw

Joe Patrice writing for Above the Law: [...] it was always only a matter of time before a poor litigant representing themselves fails to know enough to sniff out and …

Simon Willison's Blog
platform
No Image

Sandboxed tools in a loop

Something I've realized about LLM tool use is that it means that if you can reduce a problem to something that can be solved by an LLM in a sandbox …

Simon Willison's Blog
tool
No Image

Table saws

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a career thanks to the invention of the table saw.

Simon Willison's Blog
platform
No Image

Quoting Charles Babbage

On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?" In one case a …

Simon Willison's Blog
platform
Mandelbrot in x86 assembly by Claude

Mandelbrot in x86 assembly by Claude

Inspired by a tweet asking if Claude knew x86 assembly, I decided to run a bit of an experiment. I prompted Claude Sonnet 4: Write me an ascii art mandelbrot …

Simon Willison's Blog
library tool
No Image

TIL: Using Playwright MCP with Claude Code

Inspired by Armin ("I personally use only one MCP - I only use Playwright") I decided to figure out how to use the official Playwright MCP server with Claude Code. …

Simon Willison's Blog
api tool
No Image

Quoting Kevin Webb

One of the best examples of LLM developer tooling I've heard is from a team that supports software from the 80s-90s. Their only source of documentation is video interviews with …

Simon Willison's Blog
tool
No Image

A custom template system from the mid-2000s era

Using LLMs for code archaeology is pretty fun. I stumbled across this blog entry from 2003 today, in which I had gotten briefly excited about ColdFusion and implemented an experimental …

Simon Willison's Blog
library tool
No Image

Quoting mrmincent

To misuse a woodworking metaphor, I think we’re experiencing a shift from hand tools to power tools. You still need someone who understands the basics to get the good results …

Simon Willison's Blog
platform
No Image

June newsletter for sponsors has been sent

I just sent out the second edition of my sponsors only monthly newsletter. Anyone who sponsors me for $10/month or more on GitHub gets this carefully hand-curated summary of the …

Simon Willison's Blog
tool
No Image

Using Claude Code to build a GitHub Actions workflow

I wanted to add a small feature to one of my GitHub repos - an automatically updated README index listing other files in the repo - so I decided to …

Simon Willison's Blog
tool
No Image

llvm: InstCombine: improve optimizations for ceiling division with no overflow - a PR by Alex Gaynor and Claude Code

Alex Gaynor maintains rust-asn1, and recently spotted a missing LLVM compiler optimization while hacking on it, with the assistance of Claude (Alex works for Anthropic). He describes how he confirmed …

Simon Willison's Blog
library tool
Agentic Coding: The Future of Software Development with Agents

Agentic Coding: The Future of Software Development with Agents

Armin Ronacher delivers a 37 minute YouTube talk describing his adventures so far with Claude Code and agentic coding methods. A friend called Claude Code catnip for programmers and it …

Simon Willison's Blog
api cloud tool
No Image

How to Fix Your Context

Drew Breunig has been publishing some very detailed notes on context engineering recently. In How Long Contexts Fail he described four common patterns for context rot, which he summarizes like …

Simon Willison's Blog
tool
No Image

Context engineering

The term context engineering has recently started to gain traction as a better alternative to prompt engineering. I like it. I think this one may have sticking power. Here's an …

Simon Willison's Blog
platform
No Image

Continuous AI

GitHub Next have coined the term "Continuous AI" to describe "all uses of automated AI to support software collaboration on any platform". It's intended as an echo of Continuous Integration …

Simon Willison's Blog
api cloud tool
Project Vend: Can Claude run a small shop? (And why does that matter?)

Project Vend: Can Claude run a small shop? (And why does that matter?)

In "what could possibly go wrong?" news, Anthropic and Andon Labs wired Claude 3.7 Sonnet up to a small vending machine in the Anthropic office, named it Claudius and told …

Simon Willison's Blog
tool
Introducing Gemma 3n: The developer guide

Introducing Gemma 3n: The developer guide

Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered …

Simon Willison's Blog
tool
Geminiception

Geminiception

Yesterday Anthropic got a bunch of buzz out of their new window.claude.complete() API which allows Claude Artifacts to run their own API calls. It turns out Gemini had beaten them …

Simon Willison's Blog
api tool
No Image

New sandboxes from Cloudflare and Vercel

Two interesting new products for running code in a sandbox today. Cloudflare launched their Containers product in open beta, and added a new Sandbox library for Cloudflare Workers that can …

Simon Willison's Blog
tool
Build and share AI-powered apps with Claude

Build and share AI-powered apps with Claude

Anthropic have added one of the most important missing features to Claude Artifacts: apps built as artifacts now have the ability to run their own prompts against Claude via a …

Simon Willison's Blog
api tool
No Image

Quoting Christoph Niemann

Creating art is a nonlinear process. I start with a rough goal. But then I head into dead ends and get lost or stuck. The secret to my process is …

Simon Willison's Blog
platform
Gemini CLI

Gemini CLI

First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own …

Simon Willison's Blog
api tool
No Image

Anthropic wins a major fair use victory for AI — but it’s still in trouble for stealing books

Major USA legal news for the AI industry today. Judge William Alsup released a "summary judgement" (a legal decision that results in some parts of a case skipping a trial) …

Simon Willison's Blog
api cloud tool
Phoenix.new is Fly's entry into the prompt-driven app development space

Phoenix.new is Fly's entry into the prompt-driven app development space

Here’s a fascinating new entrant into the AI-assisted-programming / coding-agents space by Fly.io, introduced on their blog in Phoenix.new – The Remote AI Runtime for Phoenix: describe an app in …

Simon Willison's Blog
framework tool
No Image

Disclosures

I've added a Disclosures section to my about page, listing my various sources of income and the companies that directly sponsor my work or have supported it in the recent …

Simon Willison's Blog
tool
No Image

Quoting Kent Beck

So you you can think really big thoughts and the leverage of having those big thoughts has just suddenly expanded enormously. I had this tweet two years ago where I …

Simon Willison's Blog
tool
My First Open Source AI Generated Library

My First Open Source AI Generated Library

Armin Ronacher had Claude and Claude Code do almost all of the work in building, testing, packaging and publishing a new Python library based on his design: It wrote ~1100 …

Simon Willison's Blog
library
No Image

model.yaml

From their GitHub repo it looks like this effort quietly launched a couple of months ago, driven by the LM Studio team. Their goal is to specify an

Simon Willison's Blog
library
No Image

Quoting FAQ for Your Brain on ChatGPT

Is it safe to say that LLMs are, in essence, making us "dumber"? No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", and so on. …

Simon Willison's Blog
platform
AbsenceBench: Language Models Can't Tell What's Missing

AbsenceBench: Language Models Can't Tell What's Missing

Here's another interesting result to file under the

Simon Willison's Blog
platform
Magenta RealTime: An Open-Weights Live Music Model

Magenta RealTime: An Open-Weights Live Music Model

Fun new

Simon Willison's Blog
tool
Agentic Misalignment: How LLMs could be insider threats

Agentic Misalignment: How LLMs could be insider threats

One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be taken …

Simon Willison's Blog
platform
python-importtime-graph

python-importtime-graph

I was exploring why a Python tool was taking over a second to start running and I learned about the python -X importtime feature, documented here. Adding that option causes …

Simon Willison's Blog
library tool
No Image

Mistral-Small 3.2

Released on Hugging Face a couple of hours ago, so far there aren't any quantizations to run it on a Mac but I'm sure those will emerge pretty quickly. This …

Simon Willison's Blog
platform
No Image

Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk

Stop me if you've heard this one before: A threat actor (acting as an external user) submits a malicious support ticket. An internal user, linked to a tenant, invokes an …

Simon Willison's Blog
api security
How OpenElections Uses LLMs

How OpenElections Uses LLMs

The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are …

Simon Willison's Blog
api tool
No Image

Clarified zucchini consommé

I continue to have fun running fantasy cooking prompts through LLMs - this time I tried

Simon Willison's Blog
tool
No Image

Quoting Arvind Narayanan

Radiology has embraced AI enthusiastically, and the labor force is growing nevertheless. The augmentation-not-automation effect of AI is despite the fact that AFAICT there is no identified "task" at which …

Simon Willison's Blog
platform
No Image

Quoting Workaccount2 on Hacker News

They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output …

Simon Willison's Blog
platform
No Image

Coding agents require skilled operators

I wrote this recently in a conversation about whether coding agents can work as a replacement for human programmers. The

Simon Willison's Blog
platform
No Image

I counted all of the yurts in Mongolia using machine learning

Fascinating, detailed account by Monroe Clinton of a geospatial machine learning project. Monroe wanted to count visible yurts in Mongolia using Google Maps satellite view. The resulting project incorporates mercantile …

Simon Willison's Blog
tool
No Image

It's a trap

That memvid thing that

Simon Willison's Blog
security
Trying out the new Gemini 2.5 model family

Trying out the new Gemini 2.5 model family

After many months of previews, Gemini 2.5 Pro and Flash have reached general availability with new, memorable model IDs: gemini-2.5-pro and gemini-2.5-flash. They are joined by a new preview model …

Simon Willison's Blog
tool
No Image

Quoting Donghee Na

The Steering Council (SC) approves PEP 779 [Criteria for supported status for free-threaded Python], with the effect of removing the “experimental” tag from the free-threaded build of Python 3.14 [...] …

Simon Willison's Blog
platform runtime tool