Simon Willison's Blog

Simon Willison's Blog

simonwillison.net/
241
Articles
9月27日 09:01
Last updated
No Image

ForcedLeak: AI Agent risks exposed in Salesforce AgentForce

Classic lethal trifecta image exfiltration bug reported against Salesforce AgentForce by Sasi Levi and Noma Security. Here the malicious instructions come in via the Salesforce Web-to-Lead feature. When a Salesforce …

Simon Willison's Blog
api cloud security
No Image

How to stop AI’s “lethal trifecta”

This is the second mention of the lethal trifecta in the Economist in just the last week! Their earlier coverage was Why AI systems may never be secure on September …

Simon Willison's Blog
security
No Image

GitHub Copilot CLI is now in public preview

GitHub now have their own entry in the coding terminal CLI agent space: Copilot CLI. It's the same basic shape as Claude Code, Codex CLI, Gemini CLI and a growing …

Simon Willison's Blog
api tool
Improved Gemini 2.5 Flash and Flash-Lite

Improved Gemini 2.5 Flash and Flash-Lite

Two new preview models from Google - updates to their fast and inexpensive Flash and Flash Lite families: The latest version of Gemini 2.5 Flash-Lite was trained and built based …

Simon Willison's Blog
api tool
No Image

Don't hide your best documentation

If you hide the system prompt and tool descriptions for your LLM agent, what you're actually doing is deliberately hiding the most useful documentation describing your service from your most …

Simon Willison's Blog
platform
No Image

Quoting Stanford CS221 Autumn 2025

[2 points] Learn basic NumPy operations with an AI tutor! Use an AI chatbot (e.g., ChatGPT, Claude, Gemini, or Stanford AI Playground) to teach yourself how to do basic vector …

Simon Willison's Blog
tool
No Image

Cross-Agent Privilege Escalation: When Agents Free Each Other

Here's a clever new form of AI exploit from Johann Rehberger, who has coined the term Cross-Agent Privilege Escalation to describe an attack where multiple coding agents - GitHub Copilot …

Simon Willison's Blog
security
GPT-5-Codex

GPT-5-Codex

OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed …

Simon Willison's Blog
api library tool
No Image

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action

I've been looking forward to this. Qwen 2.5 VL is one of the best available open weight vision LLMs, so I had high hopes for Qwen 3's vision models. Firstly, …

Simon Willison's Blog
platform
No Image

Why AI systems might never be secure

The Economist have a new piece out about LLM security, with this headline and subtitle: Why AI systems might never be secure A “lethal trifecta” of conditions opens them to …

Simon Willison's Blog
security
No Image

Quoting Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano and Jeffrey T. Hancock

We define workslop as AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task. Here’s how this happens. As AI tools …

Simon Willison's Blog
tool
Four new releases from Qwen

Four new releases from Qwen

It's been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements): Qwen3-Next-80B-A3B-Instruct-FP8 and …

Simon Willison's Blog
library tool
CompileBench: Can AI Compile 22-year-old Code?

CompileBench: Can AI Compile 22-year-old Code?

Interesting new LLM benchmark from Piotr Grabowski and Piotr Migdał: how well can different models handle compilation challenges such as cross-compiling gucr for ARM64 architecture? This is one of my …

Simon Willison's Blog
api tool
No Image

ChatGPT Is Blowing Up Marriages as Spouses Use AI to Attack Their Partners

Maggie Harrison Dupré for Futurism. It turns out having an always-available "marriage therapist" with a sycophantic instinct to always take your side is catastrophic for relationships. The tension in the …

Simon Willison's Blog
platform
No Image

Locally AI

Handy new iOS app by Adrien Grondin for running local LLMs on your phone. It just added support for the new iOS 26 Apple Foundation model, so you can install …

Simon Willison's Blog
mobile
No Image

llm-openrouter 0.5

New release of my LLM plugin for accessing models made available via OpenRouter. The release notes in full: Support for tool calling. Thanks, James Sanford. #43 Support for reasoning options, …

Simon Willison's Blog
api tool
Grok 4 Fast

Grok 4 Fast

New hosted vision-enabled reasoning model from xAI that's designed to be fast and extremely competitive on price. It has a 2 million token context window and "was trained end-to-end with …

Simon Willison's Blog
tool
No Image

Magistral 1.2

Mistral quietly released two new models yesterday: Magistral Small 1.2 (Apache 2.0, 96.1 GB on Hugging Face) and Magistral Medium 1.2 (not open weights same as Mistral's other "medium" models.) …

Simon Willison's Blog
platform
No Image

The Hidden Risk in Notion 3.0 AI Agents: Web Search Tool Abuse for Data Exfiltration

Abi Raghuram reports that Notion 3.0, released yesterday, introduces new prompt injection data exfiltration vulnerabilities thanks to enabling lethal trifecta attacks. Abi's attack involves a PDF with hidden text (white …

Simon Willison's Blog
security
No Image

Quoting Steve Jobs

Well, the types of computers we have today are tools. They’re responders: you ask a computer to do something and it will do it. The next stage is going to …

Simon Willison's Blog
tool
I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now

I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now

I’ve noticed something interesting over the past few weeks: I’ve started using the term “agent” in conversations where I don’t feel the need to then define it, roll my eyes …

Simon Willison's Blog
platform
No Image

Anthropic: A postmortem of three recent issues

Anthropic had a very bad month in terms of model reliability: Between August and early September, three infrastructure bugs intermittently degraded Claude's response quality. We've now resolved these issues and …

Simon Willison's Blog
platform
No Image

ICPC medals for OpenAI and Gemini

In July it was the International Math Olympiad (OpenAI, Gemini), today it's the International Collegiate Programming Contest (ICPC). Once again, both OpenAI and Gemini competed with models that achieved Gold …

Simon Willison's Blog
platform
No Image

Announcing the 2025 PSF Board Election Results!

I'm happy to share that I've been re-elected for second term on the board of directors of the Python Software Foundation. Jannis Leidel was also re-elected and Abigail Dogbe and …

Simon Willison's Blog
tool
GPT‑5-Codex and upgrades to Codex

GPT‑5-Codex and upgrades to Codex

OpenAI half-released a new model today: GPT‑5-Codex, a fine-tuned GPT-5 variant explicitly designed for their various AI-assisted programming tools. I say half-released because it's not yet available via their API, …

Simon Willison's Blog
api library tool
No Image

Models can prompt now

Here's an interesting example of models incrementally improving over time: I am finding that today's leading models are competent at writing prompts for themselves and each other. A year ago …

Simon Willison's Blog
platform
No Image

gpt-5 and gpt-5-mini rate limit updates

OpenAI have increased the rate limits for their two main GPT-5 models. These look significant: gpt-5 Tier 1: 30K → 500K TPM (1.5M batch) Tier 2: 450K → 1M (3M …

Simon Willison's Blog
api
No Image

Quoting Matt Webb

The trick with Claude Code is to give it large, but not too large, extremely well defined problems. (If the problems are too large then you are now vibe coding… …

Simon Willison's Blog
platform
No Image

Comparing the memory implementations of Claude and ChatGPT

Shlok Khemani has been doing excellent work reverse-engineering LLM systems and documenting his discoveries. Last week he wrote about ChatGPT memory. This week it's Claude. Claude's memory system has two …

Simon Willison's Blog
api tool
Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!

Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!

Qwen announced two new models via their Twitter account (nothing on their blog yet): Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking. They make some big claims on performance: Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship. Qwen3-Next-80B-A3B-Thinking …

Simon Willison's Blog
tool
No Image

Defeating Nondeterminism in LLM Inference

A very common question I see about LLMs concerns why they can't be made to deliver the same response to the same prompt by setting a fixed random number seed. …

Simon Willison's Blog
library tool
No Image

Claude API: Web fetch tool

New in the Claude API: if you pass the web-fetch-2025-09-10 beta header you can add {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5} to your "tools" list and Claude will gain the …

Simon Willison's Blog
api tool
No Image

I Replaced Animal Crossing's Dialogue with a Live LLM by Hacking GameCube Memory

Brilliant retro-gaming project by Josh Fonseca, who figured out how to run 2002 Game Cube Animal Crossing in the Dolphin Emulator such that dialog with the characters was instead generated …

Simon Willison's Blog
api tool
No Image

Quoting Apple Security Engineering and Architecture

There has never been a successful, widespread malware attack against iPhone. The only system-level iOS attacks we observe in the wild come from mercenary spyware, which is vastly more complex …

Simon Willison's Blog
security
My review of Claude's new Code Interpreter, released under a very confusing name

My review of Claude's new Code Interpreter, released under a very confusing name

Today on the Anthropic blog: Claude can now create and edit files: Claude can now create and edit Excel spreadsheets, documents, PowerPoint slide decks, and PDFs directly in Claude.ai and …

Simon Willison's Blog
api tool
No Image

The 2025 PSF Board Election is Open!

The Python Software Foundation's annual board member election is taking place right now, with votes (from previously affirmed voting members) accepted from September 2nd, 2:00 pm UTC through Tuesday, September …

Simon Willison's Blog
api cloud platform
No Image

Geoffrey Huntley is cursed

Geoffrey Huntley vibe-coded an entirely new programming language using Claude: The programming language is called "cursed". It's cursed in its lexical structure, it's cursed in how it was built, it's …

Simon Willison's Blog
api library tool
Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Apollo Global Management’s “Chief Economist” Dr. Torsten Sløk released this interesting chart which appears to show a slowdown in AI adoption rates among large (>250 empoloyees) companies: Here’s the full …

Simon Willison's Blog
api library tool
No Image

Anthropic status: Model output quality

Anthropic previously reported model serving bugs that affected Claude Opus 4 and 4.1 for 56.5 hours. They've now fixed additional bugs affecting "a small percentage" of Sonnet 4 requests for …

Simon Willison's Blog
platform
No Image

Quoting TheSoftwareGuy

Having worked inside AWS I can tell you one big reason [that they don't document their internals] is the attitude/fear that anything we put in out public docs may end …

Simon Willison's Blog
cloud
No Image

Load Llama-3.2 WebGPU in your browser from a local folder

Inspired by a comment on Hacker News I decided to see if it was possible to modify the transformers.js-examples/tree/main/llama-3.2-webgpu Llama 3.2 chat demo (online here, I wrote about it last …

Simon Willison's Blog
tool
No Image

Quoting James Luan

I recently spoke with the CTO of a popular AI note-taking app who told me something surprising: they spend twice as much on vector search as they do on OpenAI …

Simon Willison's Blog
api
No Image

Is the LLM response wrong, or have you just failed to iterate it?

More from Mike Caulfield (see also the SIFT method). He starts with a fantastic example of Google's AI mode usually correctly handling a common piece of misinformation but occasionally falling …

Simon Willison's Blog
platform
No Image

Quoting Anil Dash

I agree with the intellectual substance of virtually every common critique of AI. And it's very clear that turning those critiques into a competition about who can frame them in …

Simon Willison's Blog
platform
No Image

The SIFT method

The SIFT method is "an evaluation strategy developed by digital literacy expert, Mike Caulfield, to help determine whether online content can be trusted for credible or reliable sources of information." …

Simon Willison's Blog
tool
AI mode is good, actually

AI mode is good, actually

When I wrote about how good ChatGPT with GPT-5 is at search yesterday I nearly added a note about how comparatively disappointing Google's efforts around this are. I'm glad I …

Simon Willison's Blog
api cloud tool
GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search

GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search

“Don’t use chatbots as search engines” was great advice for several years... until it wasn’t. I wrote about how good OpenAI’s o3 was at using its Bing-backed search tool back …

Simon Willison's Blog
api tool
No Image

Quoting Jason Liu

I am once again shocked at how much better image retrieval performance you can get if you embed highly opinionated summaries of an image, a summary that came out of …

Simon Willison's Blog
api
Kimi-K2-Instruct-0905

Kimi-K2-Instruct-0905

New not-quite-MIT licensed model from Chinese Moonshot AI, a follow-up to the highly regarded Kimi-K2 model they released in July. This one is an incremental improvement - I've seen it …

Simon Willison's Blog
library tool
No Image

Anthropic to pay $1.5 billion to authors in landmark AI settlement

I wrote about the details of this case when it was found that Anthropic's training on book content was fair use, but they needed to have purchased individual copies of …

Simon Willison's Blog
platform
Introducing EmbeddingGemma

Introducing EmbeddingGemma

Brand new open weights (under the slightly janky Gemma license) 308M parameter embedding model from Google: Based on the Gemma 3 architecture, EmbeddingGemma is trained on 100+ languages and is …

Simon Willison's Blog
library tool
No Image

Highlighted tools

Any time I share my collection of tools built using vibe coding and AI-assisted development (now at 124, here's the definitive list) someone will inevitably complain that they're mostly trivial. …

Simon Willison's Blog
tool
Beyond Vibe Coding

Beyond Vibe Coding

Back in May I wrote Two publishers and three authors fail to understand what “vibe coding” means where I called out the authors of two forthcoming books on "vibe coding" …

Simon Willison's Blog
tool
No Image

gov.uscourts.dcd.223205.1436.0_1.pdf

Here's the 230 page PDF ruling on the 2023 United States v. Google LLC federal antitrust case - the case that could have resulted in Google selling off Chrome and …

Simon Willison's Blog
api cloud tool
Rich Pixels

Rich Pixels

Neat Python library by Darren Burns adding pixel image support to the Rich terminal library, using tricks to render an image using full or half-height colored blocks. Here's the key …

Simon Willison's Blog
library tool
No Image

August 2025 newsletter

I just sent out my August 2025 sponsors-only newsletter summarizing the past month in LLMs and my other work. Topics included GPT-5, gpt-oss, image editing models (Qwen-Image-Edit and Gemini Nano …

Simon Willison's Blog
platform
No Image

Introducing gpt-realtime

Released a few days ago (August 28th), gpt-realtime is OpenAI's new "most advanced speech-to-speech model". It looks like this is a replacement for the older gpt-4o-realtime-preview model that was released …

Simon Willison's Blog
platform
Cloudflare Radar: AI Insights

Cloudflare Radar: AI Insights

Cloudflare launched this dashboard back in February, incorporating traffic analysis from Cloudflare's network along with insights from their popular 1.1.1.1 DNS service. I found this chart particularly interesting, showing which …

Simon Willison's Blog
cloud
No Image

Claude Opus 4.1 and Opus 4 degraded quality

Notable because often when people complain of degraded model quality it turns out to be unfounded - Anthropic in the past have emphasized that they don't change the model weights …

Simon Willison's Blog
platform
No Image

Quoting Benj Edwards

LLMs are intelligence without agency—what we might call "vox sine persona": voice without person. Not the voice of someone, not even the collective voice of many someones, but a voice …

Simon Willison's Blog
platform
The perils of vibe coding

The perils of vibe coding

I was interviewed by Elaine Moore for this opinion piece in the Financial Times, which ended up in the print edition of the paper too! I picked up a copy …

Simon Willison's Blog
api tool
No Image

Lossy encyclopedia

Since I love collecting questionable analogies for LLMs, here's a new one I just came up with: an LLM is a lossy encyclopedia. They have a huge array of facts …

Simon Willison's Blog
platform
No Image

Python: The Documentary

New documentary about the origins of the Python programming language - 84 minutes long, built around extensive interviews with Guido van Rossum and others who were there at the start …

Simon Willison's Blog
youtube
No Image

Quoting Bruce Schneier

We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks. Any AI that is working in an adversarial environment—and …

Simon Willison's Blog
security
No Image

Piloting Claude for Chrome

Two days ago I said: I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely. Today Anthropic announced their own …

Simon Willison's Blog
api security tool
No Image

Will Smith’s concert crowds are real, but AI is blurring the lines

Great piece from Andy Baio demonstrating quite how convoluted the usage ethics and backlash against generative AI has become. Will Smith has been accused of using AI to misleadingly inflate …

Simon Willison's Blog
platform
No Image

Agentic Browser Security: Indirect Prompt Injection in Perplexity Comet

The security team from Brave took a look at Comet, the LLM-powered "agentic browser" extension from Perplexity, and unsurprisingly found security holes you can drive a truck through. The vulnerability …

Simon Willison's Blog
api security tool
No Image

ChatGPT release notes: Project-only memory

The feature I've most wanted from ChatGPT's memory feature (the newer version of memory that automatically includes relevant details from summarized prior conversations) just landed: With project-only memory enabled, ChatGPT …

Simon Willison's Blog
platform
DeepSeek 3.1

DeepSeek 3.1

The latest model from DeepSeek, a 685B monster (like DeepSeek v3 before it) but this time it's a hybrid reasoning model. DeepSeek claim: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, …

Simon Willison's Blog
platform
No Image

Quoting The Bluesky Team

Mississippi's approach would fundamentally change how users access Bluesky. The Supreme Court’s recent decision leaves us facing a hard reality: comply with Mississippi’s age assurance law—and make every Mississippi Bluesky …

Simon Willison's Blog
security
No Image

too many model context protocol servers and LLM allocations on the dance floor

Useful reminder from Geoffrey Huntley of the infrequently discussed significant token cost of using MCP. Geoffrey estimate estimates that the usable context window something like Amp or Cursor is around …

Simon Willison's Blog
api tool
No Image

Quoting potatolicious

Most classical engineering fields deal with probabilistic system components all of the time. In fact I'd go as far as to say that inability to deal with probabilistic components is …

Simon Willison's Blog
platform
No Image

Quoting Matt Garman

I was at a leadership group and people were telling me "We think that with AI we can replace all of our junior people in our company." I was like, …

Simon Willison's Blog
platform
No Image

Quoting Mustafa Suleyman

Simply put, my central worry is that many people will start to believe in the illusion of AIs as conscious entities so strongly that they’ll soon advocate for AI rights, …

Simon Willison's Blog
platform
No Image

Quoting u/AssafMalkiIL

what’s the point of vibe coding if at the end of the day i still gotta pay a dev to look at the code anyway. sure it feels kinda cool …

Simon Willison's Blog
tool
David Ho on BlueSky: A pelican tried to eat my bike

David Ho on BlueSky: A pelican tried to eat my bike

David Ho caught video footage of one of the pelicans in St James's Park expressing deep curiosity in his bicycle. I think it wants to ride it.

Simon Willison's Blog
tool
Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency

Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency

As promised in their August 4th release of the Qwen image generation model, Qwen have now followed it up with a separate model, Qwen-Image-Edit, which can take an image and …

Simon Willison's Blog
tool
llama.cpp guide: running gpt-oss with llama.cpp

llama.cpp guide: running gpt-oss with llama.cpp

Really useful official guide to running the OpenAI gpt-oss models using llama-server from llama.cpp - which provides an OpenAI-compatible localhost API and a neat web interface for interacting with the …

Simon Willison's Blog
tool
No Image

PyPI: Preventing Domain Resurrection Attacks

Domain resurrection attacks are a nasty vulnerability in systems that use email verification to allow people to recover their accounts. If somebody lets their domain name expire an attacker might …

Simon Willison's Blog
api security
No Image

r/ChatGPTPro: What is the most profitable thing you have done with ChatGPT?

This Reddit thread - with 279 replies - offers a neat targeted insight into the kinds of things people are using ChatGPT for. Lots of variety here but two themes …

Simon Willison's Blog
platform
No Image

Google Gemini URL Context

New feature in the Gemini API: you can now enable a url_context tool which the models can use to request the contents of URLs as part of replying to a …

Simon Willison's Blog
api tool
No Image

TIL: Running a gpt-oss eval suite against LM Studio on a Mac

The other day I learned that OpenAI published a set of evals as part of their gpt-oss model release, described in their cookbook on Verifying gpt-oss implementations. I decided to …

Simon Willison's Blog
tool
No Image

Quoting Sam Altman

Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.

Simon Willison's Blog
platform
No Image

GPT-5 has a hidden system prompt

It looks like GPT-5 when accessed via the OpenAI API may have its own hidden system prompt, independent from the system prompt you can specify in an API call. At …

Simon Willison's Blog
api
The Summer of Johann: prompt injections as far as the eye can see

The Summer of Johann: prompt injections as far as the eye can see

Independent AI researcher Johann Rehberger (previously) has had an absurdly busy August. Under the heading The Month of AI Bugs he has been publishing one report per day across an …

Simon Willison's Blog
api security tool
No Image

Meta’s AI rules have let bots hold ‘sensual’ chats with kids, offer false medical info

This is grim. Reuters got hold of a leaked copy Meta's internal "GenAI: Content Risk Standards" document: Running to more than 200 pages, the document defines what Meta staff and …

Simon Willison's Blog
security
Open weight LLMs exhibit inconsistent performance across providers

Open weight LLMs exhibit inconsistent performance across providers

Artificial Analysis published a new benchmark the other day, this time focusing on how an individual model—OpenAI’s gpt-oss-120b—performs across different hosted providers. The results showed some surprising differences. Here’s the …

Simon Willison's Blog
api cloud tool
No Image

Quoting Steve Wozniak

I gave all my Apple wealth away because wealth and power are not what I live for. I have a lot of fun and happiness. I funded a lot of …

Simon Willison's Blog
platform
No Image

Quoting Cory Doctorow

NERD HARDER! is the answer every time a politician gets a technological idée-fixe about how to solve a social problem by creating a technology that can't exist. It's the answer …

Simon Willison's Blog
security
No Image

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

New from Google: Gemma 3 270M, a compact, 270-million parameter model designed from the ground up for task-specific fine-tuning with strong instruction-following and text structuring capabilities already trained in. This …

Simon Willison's Blog
framework tool
No Image

Screaming in the Cloud: AI’s Security Crisis: Why Your Assistant Might Betray You

I recorded this podcast conversation with Corey Quinn a few weeks ago: On this episode of Screaming in the Cloud, Corey Quinn talks with Simon Willison, founder of Datasette and …

Simon Willison's Blog
api security tool
How Does A Blind Model See The Earth?

How Does A Blind Model See The Earth?

Fun, creative new micro-eval. Split the world into a sampled collection of latitude longitude points and for each one ask a model: If this location is over land, say 'Land'. …

Simon Willison's Blog
platform
simonw/codespaces-llm

simonw/codespaces-llm

GitHub Codespaces provides full development environments in your browser, and is free to use with anyone with a GitHub account. Each environment has a full Linux container and a browser-based …

Simon Willison's Blog
api tool
No Image

Claude Sonnet 4 now supports 1M tokens of context

Gemini and OpenAI both have million token models, so it's good to see Anthropic catching up. This is 5x the previous 200,000 context length limit of the various Claude Sonnet …

Simon Willison's Blog
api tool
No Image

Quoting Nick Turley

I think there's been a lot of decisions over time that proved pretty consequential, but we made them very quickly as we have to. [...] [On pricing] I had this …

Simon Willison's Blog
api tool
No Image

LLM 0.27, the annotated release notes: GPT-5 and improved tool calling

I shipped LLM 0.27 today, adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to the tool calling features introduced in LLM 0.26. …

Simon Willison's Blog
api library tool
No Image

Reddit will block the Internet Archive

Well this sucks. Jay Peters for the Verge: Reddit says that it has caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to start …

Simon Willison's Blog
api security
No Image

Codex upgrade

If you've been experimenting with OpenAI's Codex CLI and have been frustrated that it's not possible to select text and copy it to the clipboard, at least when running in …

Simon Willison's Blog
library tool
qwen-image-mps

qwen-image-mps

Ivan Fioravanti built this Python CLI script for running the Qwen/Qwen-Image image generation model on an Apple silicon Mac, optionally using the Qwen-Image-Lightning LoRA to dramatically speed up generation. Ivan …

Simon Willison's Blog
tool
No Image

AI for data engineers with Simon Willison

I recorded an episode last week with Claire Giordano for the Talking Postgres podcast. The topic was "AI for data engineers" but we ended up covering an enjoyable range of …

Simon Willison's Blog
api cloud tool
Qwen3-4B-Thinking: "This is art - pelicans don't ride bikes!"

Qwen3-4B-Thinking: "This is art - pelicans don't ride bikes!"

I’ve fallen a few days behind keeping up with Qwen. They released two new 4B models last week: Qwen3-4B-Instruct-2507 and its thinking equivalent Qwen3-4B-Thinking-2507. These are relatively tiny models that …

Simon Willison's Blog
api tool
No Image

Quoting Sam Altman

the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to …

Simon Willison's Blog
platform
No Image

Quoting Ethan Mollick

The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get …

Simon Willison's Blog
platform
No Image

Quoting Thomas Dohmke

You know what else we noticed in the interviews? Developers rarely mentioned “time saved” as the core benefit of working in this new way with agents. They were all about …

Simon Willison's Blog
platform
No Image

When a Jira Ticket Can Steal Your Secrets

Zenity Labs describe a classic lethal trifecta attack, this time against Cursor, MCP, Jira and Zendesk. They also have a short video demonstrating the issue. Zendesk support emails are often …

Simon Willison's Blog
api security tool
My Lethal Trifecta talk at the Bay Area AI Security Meetup

My Lethal Trifecta talk at the Bay Area AI Security Meetup

I gave a talk on Wednesday at the Bay Area AI Security Meetup about prompt injection, the lethal trifecta and the challenges of securing systems that use MCP. It wasn’t …

Simon Willison's Blog
api security tool
No Image

Quoting @pearlmania500

I have a toddler. My biggest concern is that he doesn't eat rocks off the ground and you're talking to me about ChatGPT psychosis? Why do we even have that? …

Simon Willison's Blog
platform
No Image

Quoting Sam Altman

GPT-5 rollout updates: We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout. We will let Plus users choose to continue to use 4o. …

Simon Willison's Blog
platform
No Image

The surprise deprecation of GPT-4o for ChatGPT consumers

I’ve been dipping into the r/ChatGPT subreddit recently to see how people are reacting to the GPT-5 launch, and so far the vibes there are not good. This AMA thread …

Simon Willison's Blog
api tool
Previewing GPT-5 at OpenAI's office

Previewing GPT-5 at OpenAI's office

A couple of weeks ago I was invited to OpenAI's headquarters for a "preview event", for which I had to sign both an NDA and a video release waiver. I …

Simon Willison's Blog
youtube
GPT-5: Key characteristics, pricing and model card

GPT-5: Key characteristics, pricing and model card

I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still …

Simon Willison's Blog
platform
No Image

Jules, our asynchronous coding agent, is now available for everyone

I wrote about the Jules beta back in May. Google's version of the OpenAI Codex PR-submitting hosted coding tool graduated from beta today. I'm mainly linking to this now because …

Simon Willison's Blog
api tool
No Image

Qwen3-4B Instruct and Thinking

Yet another interesting model from Qwen - these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with …

Simon Willison's Blog
platform
No Image

Quoting Artificial Analysis

gpt-oss-120b is the most intelligent American open weights model, comes behind DeepSeek R1 and Qwen3 235B in intelligence but offers efficiency benefits [...] We’re seeing the 120B beat o3-mini but …

Simon Willison's Blog
platform
No Image

No, AI is not Making Engineers 10x as Productive

Colton Voege on "curing your AI 10x engineer imposter syndrome". There's a lot of rhetoric out there suggesting that if you can't 10x your productivity through tricks like running a …

Simon Willison's Blog
tool
OpenAI's new open weight (Apache 2) models are really good

OpenAI's new open weight (Apache 2) models are really good

The long promised OpenAI open weight models are here, and they are very impressive. They’re available under proper open source licenses—Apache 2.0—and come in two sizes, 120B and 20B. OpenAI’s …

Simon Willison's Blog
api tool
Claude Opus 4.1

Claude Opus 4.1

Surprise new model from Anthropic today - Claude Opus 4.1, which they describe as "a drop-in replacement for Opus 4". My favorite thing about this model is the version number …

Simon Willison's Blog
platform
No Image

Quoting greyduet on r/teachers

I teach HS Science in the south. I can only speak for my district, but a few teacher work days in the wave of enthusiasm I'm seeing for AI tools …

Simon Willison's Blog
tool
ChatGPT agent's user-agent

ChatGPT agent's user-agent

I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it …

Simon Willison's Blog
api tool
Usage charts for my LLM tool against OpenRouter

Usage charts for my LLM tool against OpenRouter

OpenRouter proxies requests to a large number of different LLMs and provides high level statistics of which models are the most popular among their users. Tools that call OpenRouter can …

Simon Willison's Blog
api tool
Qwen-Image: Crafting with Native Text Rendering

Qwen-Image: Crafting with Native Text Rendering

Not content with releasing six excellent open weights LLMs in July, Qwen are kicking off August with their first ever image generation model. Qwen-Image is a 20 billion parameter MMDiT …

Simon Willison's Blog
library tool
Quoting @himbodhisattva

Quoting @himbodhisattva

for services that wrap GPT-3, is it possible to do the equivalent of sql injection? like, a prompt-injection attack? make it think it's completed the task and then get access …

Simon Willison's Blog
security
No Image

I Saved a PNG Image To A Bird

Benn Jordan provides one of the all time great YouTube video titles, and it's justified. He drew an image in an audio spectrogram, played that sound to a talented starling …

Simon Willison's Blog
tool youtube
No Image

Quoting Nick Turley

This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year.

Simon Willison's Blog
api cloud
XBai o4

XBai o4

Yet another open source (Apache 2.0) LLM from a Chinese AI lab. This model card claims: XBai o4 excels in complex reasoning capabilities and has now completely surpassed OpenAI-o3-mini in …

Simon Willison's Blog
tool
No Image

Faster inference

Two interesting examples of inference speed as a flagship feature of LLM services today. First, Cerebras announced two new monthly plans for their extremely high speed hosted model service: Cerebras …

Simon Willison's Blog
api tool
Deep Think in the Gemini app

Deep Think in the Gemini app

Google released Gemini 2.5 Deep Think this morning, exclusively to their Ultra ($250/month) subscribers: It is a variation of the model that recently achieved the gold-medal standard at this year's …

Simon Willison's Blog
platform
No Image

July newsletter for sponors is out

This morning I sent out the third edition of my LLM digest newsletter for my $10/month and higher sponsors on GitHub. It included the following section headers: Claude Code Model …

Simon Willison's Blog
tool
No Image

Quoting Logan Kilpatrick

Gemini Deep Think, our SOTA model with parallel thinking that won the IMO Gold Medal 🥇, is now available in the Gemini App for Ultra subscribers!! [...] Quick correction: this …

Simon Willison's Blog
platform
Reverse engineering some updates to Claude

Reverse engineering some updates to Claude

Anthropic released two major new features for their consumer-facing Claude apps in the past couple of days. Sadly, they don’t do a very good job of updating the release notes …

Simon Willison's Blog
api tool
No Image

Quoting Christina Wodtke

The old timers who built the early web are coding with AI like it's 1995. Think about it: They gave blockchain the sniff test and walked away. Ignored crypto (and …

Simon Willison's Blog
platform
No Image

More model releases on 31st July

Here are a few more model releases from today, to round out a very busy July: Cohere released Command A Vision, their first multi-modal (image input) LLM. Like their others …

Simon Willison's Blog
library tool
Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Qwen just released their sixth model(!) for this July called Qwen3-Coder-30B-A3B-Instruct—listed as Qwen3-Coder-Flash in their chat.qwen.ai interface. It’s 30.5B total parameters with 3.3B active at any one time. This means …

Simon Willison's Blog
library tool
Ollama's new app

Ollama's new app

Ollama has been one of my favorite ways to run local models for a while - it makes it really easy to download models, and it's smart about keeping them …

Simon Willison's Blog
tool
No Image

Quoting Steve Krouse

When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: …

Simon Willison's Blog
platform
The best available open weight LLMs now come from China

The best available open weight LLMs now come from China

Something that has become undeniable this month is that the best available open weight models now come from the Chinese AI labs. I continue to have a lot of love …

Simon Willison's Blog
platform
Qwen3-30B-A3B-Thinking-2507

Qwen3-30B-A3B-Thinking-2507

Yesterday was Qwen3-30B-A3B-Instruct-2507. Qwen are clearly committed to their new split between reasoning and non-reasoning models (a reversal from Qwen 3 in April), because today they released the new reasoning …

Simon Willison's Blog
platform
No Image

OpenAI: Introducing study mode

New ChatGPT feature, which can be triggered by typing /study or by visiting chatgpt.com/studymode. OpenAI say: Under the hood, study mode is powered by custom system instructions we’ve written in …

Simon Willison's Blog
platform
Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen/Qwen3-30B-A3B-Instruct-2507

New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said: Smarter, faster, and local deployment-friendly. ✨ Key Enhancements: ✅ Enhanced reasoning, …

Simon Willison's Blog
platform
No Image

Quoting Nilay Patel

Our plan is to build direct traffic to our site. and newsletters just one kind of direct traffic in the end. I don’t intend to ever rely on someone else’s …

Simon Willison's Blog
tool
No Image

Quoting Anthropic

We’re rolling out new weekly rate limits for Claude Pro and Max in late August. We estimate they’ll apply to less than 5% of subscribers based on current usage. [...] …

Simon Willison's Blog
platform
GLM-4.5: Reasoning, Coding, and Agentic Abililties

GLM-4.5: Reasoning, Coding, and Agentic Abililties

Another day, another significant new open weight model release from a Chinese frontier AI lab. This time it's Z.ai - who rebranded (at least in English) from Zhipu AI a …

Simon Willison's Blog
tool
No Image

Enough AI copilots! We need AI HUDs

Geoffrey Litt compares Copilots - AI assistants that you engage in dialog with and work with you to complete a task - with HUDs, Head-Up Displays, which enhance your working …

Simon Willison's Blog
tool
No Image

Official statement from Tea on their data leak

Tea is a dating safety app for women that lets them share notes about potential dates. The other day it was subject to a truly egregious data leak caused by …

Simon Willison's Blog
api security
Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507

The third Qwen model release week, following Qwen3-235B-A22B-Instruct-2507 on Monday 21st and Qwen3-Coder-480B-A35B-Instruct on Tuesday 22nd. Those two were both non-reasoning models - a change from the previous models in …

Simon Willison's Blog
platform
Using GitHub Spark to reverse engineer GitHub Spark

Using GitHub Spark to reverse engineer GitHub Spark

GitHub Spark was released in public preview yesterday. It’s GitHub’s implementation of the prompt-to-app pattern also seen in products like Claude Artifacts, Lovable, Vercel v0, Val Town Townie and Fly.io’s …

Simon Willison's Blog
api framework tool
No Image

Quoting Recurse Center

[...] You learn best and most effectively when you are learning something that you care about. Your work becomes meaningful and something you can be proud of only when you …

Simon Willison's Blog
platform
Instagram Reel: Veo 3 paid preview

Instagram Reel: Veo 3 paid preview

@googlefordevs on Instagram published this reel featuring Christina Warren with prompting tips for the new Veo 3 paid preview (mp4 copy here). (Christine checked first if I minded them using …

Simon Willison's Blog
tool
TimeScope: How Long Can Your Video Large Multimodal Model Go?

TimeScope: How Long Can Your Video Large Multimodal Model Go?

New open source benchmark for evaluating vision LLMs on how well they handle long videos: TimeScope probes the limits of long-video capabilities by inserting several short (~5-10 second) video clips---our …

Simon Willison's Blog
api tool
1KB JS Numbers Station

1KB JS Numbers Station

Terence Eden built a neat and weird 1023 byte JavaScript demo that simulates a numbers station using the browser SpeechSynthesisUtterance, which I hadn't realized is supported by every modern browser …

Simon Willison's Blog
api tool
No Image

Quoting Dave White

like, one day you discover you can talk to dogs. it's fun and interesting so you do it more, learning the intricacies of their language and their deepest customs. you …

Simon Willison's Blog
platform
No Image

Quoting ICML 2025

Submitting a paper with a "hidden" prompt is scientific misconduct if that prompt is intended to obtain a favorable review from an LLM. The inclusion of such a prompt is …

Simon Willison's Blog
platform
Qwen3-Coder: Agentic Coding in the World

Qwen3-Coder: Agentic Coding in the World

It turns out that as I was typing up my notes on Qwen3-235B-A22B-Instruct-2507 the Qwen team were unleashing something much bigger: Today, we’re announcing Qwen3-Coder, our most agentic code model …

Simon Willison's Blog
api cloud tool
Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen/Qwen3-235B-A22B-Instruct-2507

Significant new model release from Qwen, published yesterday without much fanfare. This is a follow-up to their April release of the full Qwen 3 model family, which included a Qwen3-235B-A22B …

Simon Willison's Blog
platform
Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

This new alignment paper from Anthropic wins my prize for best illustrative figure so far this year: The researchers found that fine-tuning a model on data generated by another model …

Simon Willison's Blog
platform
Our contribution to a global environmental standard for AI

Our contribution to a global environmental standard for AI

Mistral have released environmental impact numbers for their largest model, Mistral Large 2, in more detail than I have seen from any of the other large AI labs. The methodology …

Simon Willison's Blog
platform
No Image

Gemini 2.5 Flash-Lite is now stable and generally available

The last remaining member of the Gemini 2.5 trio joins Pro and Flash in General Availability today. Gemini 2.5 Flash-Lite is the cheapest of the 2.5 family, at $0.10/million input …

Simon Willison's Blog
api tool
No Image

Textual v4.0.0: The Streaming Release

Will McGugan may no longer be running a commercial company around Textual, but that hasn't stopped his progress on the open source project. He recently released v4 of his Python …

Simon Willison's Blog
api library tool
No Image

tidwall/pogocache

New project from Josh Baker, author of the excellent tg C geospatial libarry (covered previously) and various other interesting projects: Pogocache is fast caching software built from scratch with a …

Simon Willison's Blog
platform
No Image

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

OpenAI beat them to the punch in terms of publicity by publishing their results on Saturday, but a team from Google Gemini achieved an equally impressive result on this year's …

Simon Willison's Blog
platform
No Image

Quoting Daniel Litt

An AI tool that gets gold on the IMO is obviously immensely impressive. Does it mean math is “solved”? Is an AI-generated proof of the Riemann hypothesis clearly on the …

Simon Willison's Blog
platform
No Image

Coding with LLMs in the summer of 2025 (an update)

Salvatore Sanfilippo describes his current AI-assisted development workflow. He's all-in on LLMs for code review, exploratory prototyping, pair-design and writing "part of the code under your clear specifications", but warns …

Simon Willison's Blog
api tool
No Image

Quoting Armin Ronacher

Every day someone becomes a programmer because they figured out how to make ChatGPT build something. Lucky for us: in many of those cases the AI picks Python. We should …

Simon Willison's Blog
library tool
No Image

Quoting Tim Sweeney

There’s a bigger opportunity in computer science and programming (academically conveyed or self-taught) now than ever before, by far, in my opinion. The move to AI is like replacing shovels …

Simon Willison's Blog
platform
No Image

OpenAI's gold medal performance on the International Math Olympiad

OpenAI research scientist Alexander Wei: I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s …

Simon Willison's Blog
platform
No Image

New tags

A few months I added a tool to my blog for bulk-applying tags to old content. It works as an extension to my existing search interface, letting me run searches …

Simon Willison's Blog
tool
No Image

Quoting Steve Yegge

So one of my favorite things to do is give my coding agents more and more permissions and freedom, just to see how far I can push their productivity without …

Simon Willison's Blog
tool
No Image

Quoting Paul Kedrosky

One analyst recently speculated (via Ed Conard) that, based on Nvidia's latest datacenter sales figures, AI capex may be ~2% of US GDP in 2025, given a standard multiplier. [...] …

Simon Willison's Blog
cloud infra
No Image

How to run an LLM on your laptop

I talked to Grace Huckins for this piece from MIT Technology Review on running local models. Apparently she enjoyed my dystopian backup plan! Simon Willison has a plan for the …

Simon Willison's Blog
tool
No Image

Voxtral

Mistral released their first audio-input models yesterday: Voxtral Small and Voxtral Mini. These state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B …

Simon Willison's Blog
api tool
No Image

common-pile/caselaw_access_project

Enormous openly licensed (I believe this is almost all public domain) training dataset of US legal cases: This dataset contains 6.7 million cases from the Caselaw Access Project and Court …

Simon Willison's Blog
api cloud tool
No Image

Reflections on OpenAI

Calvin French-Owen spent just over a year working at OpenAI, during which time the organization grew from 1,000 to 3,000 people and Calvin found himself in "the top 30% by …

Simon Willison's Blog
api library tool
No Image

xAI: "We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated"

They continue: One was that if you ask it "What is your surname?" it doesn't have one so it searches the internet leading to undesirable results, such as when its …

Simon Willison's Blog
tool
Application development without programmers

Application development without programmers

This book by James Martin, published in 1982, includes the following in the preface: Applications development did not change much for 20 years, but now a new wave is crashing …

Simon Willison's Blog
api framework tool
No Image

ccusage

Claude Code logs detailed usage information to the ~/.claude/ directory. ccusage is a neat little Node.js tool which reads that information and shows you a readable summary of your usage …

Simon Willison's Blog
tool
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

METR - for Model Evaluation & Threat Research - are a non-profit research institute founded by Beth Barnes, a former alignment researcher at OpenAI (see Wikipedia). They've previously contributed to …

Simon Willison's Blog
tool
Grok 4 Heavy won't reveal its system prompt

Grok 4 Heavy won't reveal its system prompt

Grok 4 Heavy is the "think much harder" version of Grok 4 that's currenly only available on their $300/month plan. Jeremy Howard relays a report from a Grok 4 Heavy …

Simon Willison's Blog
platform
No Image

Quoting @grok

On the morning of July 8, 2025, we observed undesired responses and immediately began investigating. To identify the specific language in the instructions causing the undesired behavior, we conducted multiple …

Simon Willison's Blog
platform
No Image

Musk’s latest Grok chatbot searches for billionaire mogul’s views before answering questions

I got quoted a couple of times in this story about Grok searching for tweets from:elonmusk by Matt O’Brien for the Associated Press. “It’s extraordinary,” said Simon Willison, an independent …

Simon Willison's Blog
tool
moonshotai/Kimi-K2-Instruct

moonshotai/Kimi-K2-Instruct

Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the …

Simon Willison's Blog
tool
No Image

Quoting Django’s security policies

Following the widespread availability of large language models (LLMs), the Django Security Team has received a growing number of security reports generated partially or entirely using such tools. Many of …

Simon Willison's Blog
security
No Image

Generationship: Ep. #39, Simon Willison

I recorded this podcast episode with Rachel Chalmers a few weeks ago. We talked about the resurgence of blogging, the legacy of Google Reader, learning in public, LLMs as weirdly …

Simon Willison's Blog
podcast
Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"

Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk’s stance before providing you with an answer. …

Simon Willison's Blog
platform
Grok 4

Grok 4

Released last night, Grok 4 is now available via both API and a paid subscription for end-users. Key characteristics: image and text input, text output. 256,000 context length (twice that …

Simon Willison's Blog
api tool
Infinite Monkey

Infinite Monkey

Mihai Parparita's Infinite Mac lets you run classic MacOS emulators directly in your browser. Infinite Monkey is a new feature which taps into the OpenAI Computer Use and Claude Computer …

Simon Willison's Blog
tool
Quoting Aphyr

Quoting Aphyr

I strongly suspect that Market Research Future, or a subcontractor, is conducting an automated spam campaign which uses a Large Language Model to evaluate a Mastodon instance, submit a plausible …

Simon Willison's Blog
platform
Become a command-line superhero with Simon Willison's llm tool

Become a command-line superhero with Simon Willison's llm tool

Christopher Smith ran a mini hackathon in Albany New York at the weekend around uses of my LLM - the first in-person event I'm aware of dedicated to that project! …

Simon Willison's Blog
api tool
Adding a feature because ChatGPT incorrectly thinks it exists

Adding a feature because ChatGPT incorrectly thinks it exists

Adrian Holovaty describes how his SoundSlice service saw an uptick in users attempting to use their sheet music scanner to import ASCII-art guitar tab... because it turned out ChatGPT had …

Simon Willison's Blog
api tool
I Shipped a macOS App Built Entirely by Claude Code

I Shipped a macOS App Built Entirely by Claude Code

Indragie Karunaratne has "been building software for the Mac since 2008", but recently decided to try Claude Code to build a side project: Context, a native Mac app for debugging …

Simon Willison's Blog
library tool
Quoting Nineteen Eighty-Four

Quoting Nineteen Eighty-Four

There was a whole chain of separate departments dealing with proletarian literature, music, drama, and entertainment generally. Here were produced rubbishy newspapers containing almost nothing except sport, crime and astrology, …

Simon Willison's Blog
platform
No Image

Supabase MCP can leak your entire SQL database

Here's yet another example of a lethal trifecta attack, where an LLM system combines access to private data, exposure to potentially malicious instructions and a mechanism to communicate data back …

Simon Willison's Blog
database security
No Image

Cursor: Clarifying Our Pricing

Cursor changed their pricing plan on June 16th, introducing a new $200/month Ultra plan with "20x more usage than Pro" and switching their $20/month Pro plan from "request limits to …

Simon Willison's Blog
api tool
No Image

Identify, solve, verify

The more time I spend using LLMs for code, the less I worry for my career - even as their coding capabilities continue to improve. Using LLMs as part of …

Simon Willison's Blog
platform
No Image

awwaiid/gremllm

Delightfully cursed Python library by Brock Wilcox, built on top of LLM: from gremllm import Gremllm counter = Gremllm("counter") counter.value = 5 counter.increment() print(counter.value) # 6? print(counter.to_roman_numerals()) # VI? You …

Simon Willison's Blog
library tool
No Image

Quoting Adam Gordon Bell

I think that a lot of resistance to AI coding tools comes from the same place: fear of losing something that has defined you for so long. People are reacting …

Simon Willison's Blog
platform
No Image

Frequently Asked Questions (And Answers) About AI Evals

Hamel Husain and Shreya Shankar have been running a paid, cohort-based course on AI Evals For Engineers & PMs over the past few months. Here Hamel collects answers to the …

Simon Willison's Blog
platform
No Image

Trial Court Decides Case Based On AI-Hallucinated Caselaw

Joe Patrice writing for Above the Law: [...] it was always only a matter of time before a poor litigant representing themselves fails to know enough to sniff out and …

Simon Willison's Blog
platform
No Image

Sandboxed tools in a loop

Something I've realized about LLM tool use is that it means that if you can reduce a problem to something that can be solved by an LLM in a sandbox …

Simon Willison's Blog
tool
No Image

Table saws

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a career thanks to the invention of the table saw.

Simon Willison's Blog
platform
No Image

Quoting Charles Babbage

On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?" In one case a …

Simon Willison's Blog
platform