No Image

When a Jira Ticket Can Steal Your Secrets

Zenity Labs describe a classic lethal trifecta attack, this time against Cursor, MCP, Jira and Zendesk. They also have a short video demonstrating the issue. Zendesk support emails are often …

Simon Willison's Blog
api security tool
My Lethal Trifecta talk at the Bay Area AI Security Meetup

My Lethal Trifecta talk at the Bay Area AI Security Meetup

I gave a talk on Wednesday at the Bay Area AI Security Meetup about prompt injection, the lethal trifecta and the challenges of securing systems that use MCP. It wasn’t …

Simon Willison's Blog
api security tool
No Image

Quoting @pearlmania500

I have a toddler. My biggest concern is that he doesn't eat rocks off the ground and you're talking to me about ChatGPT psychosis? Why do we even have that? …

Simon Willison's Blog
platform
No Image

Quoting Sam Altman

GPT-5 rollout updates: We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout. We will let Plus users choose to continue to use 4o. …

Simon Willison's Blog
platform
No Image

The surprise deprecation of GPT-4o for ChatGPT consumers

I’ve been dipping into the r/ChatGPT subreddit recently to see how people are reacting to the GPT-5 launch, and so far the vibes there are not good. This AMA thread …

Simon Willison's Blog
api tool
Previewing GPT-5 at OpenAI's office

Previewing GPT-5 at OpenAI's office

A couple of weeks ago I was invited to OpenAI's headquarters for a "preview event", for which I had to sign both an NDA and a video release waiver. I …

Simon Willison's Blog
youtube
GPT-5: Key characteristics, pricing and model card

GPT-5: Key characteristics, pricing and model card

I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still …

Simon Willison's Blog
platform
GPT-5 まとめ

GPT-5 まとめ

Zenn schroneko
api tool
Introducing Usage-Based Agent Credits

Introducing Usage-Based Agent Credits

Starting August 14, AI Credits shift to usage-based pricing. Pay for tokens used, not messages sent. Credits roll over with caps.

Builder.io Blog
tool
No Image

Jules, our asynchronous coding agent, is now available for everyone

I wrote about the Jules beta back in May. Google's version of the OpenAI Codex PR-submitting hosted coding tool graduated from beta today. I'm mainly linking to this now because …

Simon Willison's Blog
api tool
No Image

Qwen3-4B Instruct and Thinking

Yet another interesting model from Qwen - these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with …

Simon Willison's Blog
platform
No Image

Quoting Artificial Analysis

gpt-oss-120b is the most intelligent American open weights model, comes behind DeepSeek R1 and Qwen3 235B in intelligence but offers efficiency benefits [...] We’re seeing the 120B beat o3-mini but …

Simon Willison's Blog
platform
No Image

No, AI is not Making Engineers 10x as Productive

Colton Voege on "curing your AI 10x engineer imposter syndrome". There's a lot of rhetoric out there suggesting that if you can't 10x your productivity through tricks like running a …

Simon Willison's Blog
tool
OpenAI's new open weight (Apache 2) models are really good

OpenAI's new open weight (Apache 2) models are really good

The long promised OpenAI open weight models are here, and they are very impressive. They’re available under proper open source licenses—Apache 2.0—and come in two sizes, 120B and 20B. OpenAI’s …

Simon Willison's Blog
api tool
Claude Opus 4.1

Claude Opus 4.1

Surprise new model from Anthropic today - Claude Opus 4.1, which they describe as "a drop-in replacement for Opus 4". My favorite thing about this model is the version number …

Simon Willison's Blog
platform
No Image

Quoting greyduet on r/teachers

I teach HS Science in the south. I can only speak for my district, but a few teacher work days in the wave of enthusiasm I'm seeing for AI tools …

Simon Willison's Blog
tool
ChatGPT agent's user-agent

ChatGPT agent's user-agent

I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it …

Simon Willison's Blog
api tool
Usage charts for my LLM tool against OpenRouter

Usage charts for my LLM tool against OpenRouter

OpenRouter proxies requests to a large number of different LLMs and provides high level statistics of which models are the most popular among their users. Tools that call OpenRouter can …

Simon Willison's Blog
api tool
Qwen-Image: Crafting with Native Text Rendering

Qwen-Image: Crafting with Native Text Rendering

Not content with releasing six excellent open weights LLMs in July, Qwen are kicking off August with their first ever image generation model. Qwen-Image is a 20 billion parameter MMDiT …

Simon Willison's Blog
library tool
Quoting @himbodhisattva

Quoting @himbodhisattva

for services that wrap GPT-3, is it possible to do the equivalent of sql injection? like, a prompt-injection attack? make it think it's completed the task and then get access …

Simon Willison's Blog
security
No Image

I Saved a PNG Image To A Bird

Benn Jordan provides one of the all time great YouTube video titles, and it's justified. He drew an image in an audio spectrogram, played that sound to a talented starling …

Simon Willison's Blog
tool youtube
No Image

Quoting Nick Turley

This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year.

Simon Willison's Blog
api cloud
LLMs are facing a QA crisis: Here’s how we could solve it

LLMs are facing a QA crisis: Here’s how we could solve it

Discover how LLM QA isn’t just a tooling gap — it’s a fundamental shift in how we think about software reliability.

logrocket-dev
tool
XBai o4

XBai o4

Yet another open source (Apache 2.0) LLM from a Chinese AI lab. This model card claims: XBai o4 excels in complex reasoning capabilities and has now completely surpassed OpenAI-o3-mini in …

Simon Willison's Blog
tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (July 28 - August 3)

Elvis Saravia's NLP Blog
platform
次期GPT系モデルかもしれない「Horizon Beta」のコーディング性能を検証する

次期GPT系モデルかもしれない「Horizon Beta」のコーディング性能を検証する

2025年7月30日、OpenRouter上に「Horizon Alpha」という詳細不明のステルスモデルが登場しました。その後「Horizon Beta」という名前に置き換わりました。このモデルは、OpenAIの次期モデルのテスト用ではないか?と注目を集めています。今回は、このモデルの性能をコーディングタスクで検証しました。 https://openrouter.ai/openrouter/horizon-beta 特徴 * コンテキストウィンドウ: 256K(GPT-4.1の1M、o3/o4-miniの200Kと比較して中規模) * スループット: 126.9 tps(Sonnet 4の64.50 tpsの約2倍。コーディング時に体感で早い) * Reasoning機構: なし 本当にOpenAI系のモデルなのか? OpenAI系のモデルである可能性が議論されています。過去にもQuasar Alpha/Optimus AlphaがGPT-4.1リリース前に登場した経緯があり、今回も同様のパターンかもしれません。 直系のGPT-5ならコンテキストウィンドウは1M

Lai.so Blog
library tool
コーディングのための LLM モデル Qwen3-Coder を試してみた

コーディングのための LLM モデル Qwen3-Coder を試してみた

Alibaba が開発した Qwen3-Coder を使用したコーディングエージェント Qwen Code を試してみた記事です。OpenRouter 経由での認証設定、コードベースの調査、リファクタリング、テストコード生成などの実際の使用例を紹介しています。

azukiazusa のテックブログ2
api library tool
🤖 AI Agents Weekly: GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations

🤖 AI Agents Weekly: GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations

GLM-4.5, AI SDK 5, Video Overviews, ChatGPT Study Mode, Context engineering Tips, AlphaEarth Foundations

Elvis Saravia's NLP Blog
platform
Serena MCPはClaude Codeを救うのか?

Serena MCPはClaude Codeを救うのか?

「Claude Codeがアホになる問題」が勃発している最中、SerenaというMCPサーバーが「Claude Codeのコンテキスト消費を削減し、応答を改善する」という評価でユーザーたちの間で注目されています。 筆者も実際にSerenaを使ってみたところ、確かにコンテキスト効率の改善(入出力トークンの減少を指します)を実感できました。詳しく調べてみると、このツールは非常にユニークな発想で設計されており、一過性の流行として消費されるには惜しいと感じました。 そこで、本記事では、この機能の背景にある技術的な仕組みを詳しく解説したいと思います。実際の検証も交えながら、Serenaのアーキテクチャとその効果を分析していきます。 現在のコーディングエージェントが抱える課題 現在のコーディングエージェントの多くは、コードを単なるテキストファイルとして扱って逐次的な処理をしています。この根本的なアプローチが、制約を生み出しています。 大規模なプロジェクトで作業する際、エージェントは必要な情報を見つけるために膨大なテキストを読み込まなければなりません。関数の定義を探すだけでも、リポジトリ

Lai.so Blog
library tool
No Image

Faster inference

Two interesting examples of inference speed as a flagship feature of LLM services today. First, Cerebras announced two new monthly plans for their extremely high speed hosted model service: Cerebras …

Simon Willison's Blog
api tool
Deep Think in the Gemini app

Deep Think in the Gemini app

Google released Gemini 2.5 Deep Think this morning, exclusively to their Ultra ($250/month) subscribers: It is a variation of the model that recently achieved the gold-medal standard at this year's …

Simon Willison's Blog
platform
No Image

July newsletter for sponors is out

This morning I sent out the third edition of my LLM digest newsletter for my $10/month and higher sponsors on GitHub. It included the following section headers: Claude Code Model …

Simon Willison's Blog
tool
No Image

Quoting Logan Kilpatrick

Gemini Deep Think, our SOTA model with parallel thinking that won the IMO Gold Medal 🥇, is now available in the Gemini App for Ultra subscribers!! [...] Quick correction: this …

Simon Willison's Blog
platform
Reverse engineering some updates to Claude

Reverse engineering some updates to Claude

Anthropic released two major new features for their consumer-facing Claude apps in the past couple of days. Sadly, they don’t do a very good job of updating the release notes …

Simon Willison's Blog
api tool
No Image

Quoting Christina Wodtke

The old timers who built the early web are coding with AI like it's 1995. Think about it: They gave blockchain the sniff test and walked away. Ignored crypto (and …

Simon Willison's Blog
platform
No Image

More model releases on 31st July

Here are a few more model releases from today, to round out a very busy July: Cohere released Command A Vision, their first multi-modal (image input) LLM. Like their others …

Simon Willison's Blog
library tool
Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Qwen just released their sixth model(!) for this July called Qwen3-Coder-30B-A3B-Instruct—listed as Qwen3-Coder-Flash in their chat.qwen.ai interface. It’s 30.5B total parameters with 3.3B active at any one time. This means …

Simon Willison's Blog
library tool
Windsurf vs. Cursor: When to choose the challenger

Windsurf vs. Cursor: When to choose the challenger

Explore Windsurf AI’s Cascade agent, IDE integration, pricing, and how it stacks up against Cursor in this hands-on developer-focused comparison.

logrocket-dev
library tool
Claude Codeがアホになる問題

Claude Codeがアホになる問題

最近一部のClaude Codeユーザーの間で「性能が急激に劣化している」という報告が多発しています。具体的には、指示の内容を忘れて見当違いの作業をするというもので「これはClaude Codeのコンテキスト処理の問題ではないか?」と憶測を呼んでいます。 ※この話題はバージョン1.0.63時点のものです。 「バージョン1.0.24に固定せよ」 この問題に対して、ユーザーからの報告と対処法が以下で擬音されています。 Critical: Claude Code context amnesia causes silent code deletion · Issue #4487 · anthropics/claude-codeEnvironment Platform: Claude Code CLI Claude CLI version: 1.0.61 Operating System: macOS 15.5 (Build 24F74) Terminal: Terminal App

Lai.so Blog
api tool
Ollama's new app

Ollama's new app

Ollama has been one of my favorite ways to run local models for a while - it makes it really easy to download models, and it's smart about keeping them …

Simon Willison's Blog
tool
GLM-4.5 Meets SGLang: Reasoning, Coding, and Agentic Abilities

GLM-4.5 Meets SGLang: Reasoning, Coding, and Agentic Abilities

<p>Today, we are excited to introduce our latest flagship models <a href="https://huggingface.co/zai-org/GLM-4.5">GLM-4.5</a> and <a href="https://huggingfac...

LMSYS Blog
library tool
No Image

Quoting Steve Krouse

When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: …

Simon Willison's Blog
platform
The best available open weight LLMs now come from China

The best available open weight LLMs now come from China

Something that has become undeniable this month is that the best available open weight models now come from the Chinese AI labs. I continue to have a lot of love …

Simon Willison's Blog
platform
Qwen3-30B-A3B-Thinking-2507

Qwen3-30B-A3B-Thinking-2507

Yesterday was Qwen3-30B-A3B-Instruct-2507. Qwen are clearly committed to their new split between reasoning and non-reasoning models (a reversal from Qwen 3 in April), because today they released the new reasoning …

Simon Willison's Blog
platform
No Image

OpenAI: Introducing study mode

New ChatGPT feature, which can be triggered by typing /study or by visiting chatgpt.com/studymode. OpenAI say: Under the hood, study mode is powered by custom system instructions we’ve written in …

Simon Willison's Blog
platform
Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen/Qwen3-30B-A3B-Instruct-2507

New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said: Smarter, faster, and local deployment-friendly. ✨ Key Enhancements: ✅ Enhanced reasoning, …

Simon Willison's Blog
platform
No Image

Quoting Nilay Patel

Our plan is to build direct traffic to our site. and newsletters just one kind of direct traffic in the end. I don’t intend to ever rely on someone else’s …

Simon Willison's Blog
tool
LLMエージェントオブサーバビリティ基盤についてまとめてみた

LLMエージェントオブサーバビリティ基盤についてまとめてみた

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
api tool
No Image

Quoting Anthropic

We’re rolling out new weekly rate limits for Claude Pro and Max in late August. We estimate they’ll apply to less than 5% of subscribers based on current usage. [...] …

Simon Willison's Blog
platform
GLM-4.5: Reasoning, Coding, and Agentic Abililties

GLM-4.5: Reasoning, Coding, and Agentic Abililties

Another day, another significant new open weight model release from a Chinese frontier AI lab. This time it's Z.ai - who rebranded (at least in English) from Zhipu AI a …

Simon Willison's Blog
tool
No Image

Enough AI copilots! We need AI HUDs

Geoffrey Litt compares Copilots - AI assistants that you engage in dialog with and work with you to complete a task - with HUDs, Head-Up Displays, which enhance your working …

Simon Willison's Blog
tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (July 21 - 27)

Elvis Saravia's NLP Blog
platform
Kimi K2とLLMのベンチマークスコア

Kimi K2とLLMのベンチマークスコア

Kimi K2は、中国のMoonshot AIが開発したオープンウェイトの大規模言語モデルです。2025年1月20日に公開されたKimi k1.5以来のKimiの第4世代目のモデルです。 Kimi K2: Open Agentic IntelligenceKimi K2 is our latest Mixture-of-Experts model with 32 billion activated parameters and 1 trillion total parameters. It achieves state-of-the-art performance in frontier knowledge, math, and coding among non-thinking models. 特徴として、128Kトークンのコンテキストウィンドウがあります。参考までにClaude 4が200kでGemini 2.5 が100M。Grok4は256kです。 また、

Lai.so Blog
api tool
🤖 AI Agents Weekly: Lovable Agents, GitHub Spark, Qwen3-Coder, Search Arena, Awesome Context Engineering

🤖 AI Agents Weekly: Lovable Agents, GitHub Spark, Qwen3-Coder, Search Arena, Awesome Context Engineering

Lovable Agents, GitHub Spark, Qwen3-Coder, Search Arena, Awesome Context Engineering

Elvis Saravia's NLP Blog
api library tool
No Image

Official statement from Tea on their data leak

Tea is a dating safety app for women that lets them share notes about potential dates. The other day it was subject to a truly egregious data leak caused by …

Simon Willison's Blog
api security
完全自律型AIエージェントのベンチマーク(2): Codex、Jules、OpenHandsを加えて

完全自律型AIエージェントのベンチマーク(2): Codex、Jules、OpenHandsを加えて

TL;DR * Devinは長時間タスクの完走能力が他のエージェントより優れています。その分コストも高いです。 * Claude Code Actionはタスク実行速度が最も速く、成功率も高いです。コストパフォーマンスも高いです。 * その他のエージェントは内部セッションタイムアウトがあり、タスクを中断します。長時間タスクには向きません。 最終結果 エージェント名 完了問題数/実行時間 コスト 1問あたり 正解数/正解率 結果 🏅Devin 98問/216分 $36 $0.37 92問/91.1% 長時間タスク完遂能力抜群、コスト高 🥈Claude Code Action 92問/42分 $7.89 $0.09 65問/64.4% 最速・高コスパ 🥉GitHub Copilot Coding Agent

Lai.so Blog
library tool
Claude Code でカスタムサブエージェントを作成する

Claude Code でカスタムサブエージェントを作成する

Claude Code では特定の種類のタスクを処理するために呼び出されるカスタムサブエージェントを作成できます。カスタムサブエージェントを使用することでメインの会話セッションとは別に独立したコンテキストウィンドウを持つことができ、コンテキストの汚染を防ぐことができます。この記事では、Claude Code でカスタムサブエージェントを作成する方法とその利点について解説します。

azukiazusa のテックブログ2
api tool
Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507

The third Qwen model release week, following Qwen3-235B-A22B-Instruct-2507 on Monday 21st and Qwen3-Coder-480B-A35B-Instruct on Tuesday 22nd. Those two were both non-reasoning models - a change from the previous models in …

Simon Willison's Blog
platform
AI + a16z Podcast: Vibe Coding, Security Risks, and the Path to Progress

AI + a16z Podcast: Vibe Coding, Security Risks, and the Path to Progress

Socket CEO Feross Aboukhadijeh and a16z partner Joel de la Garza discuss vibe coding, AI-driven software development, and how the rise of LLMs, despit...

Socket
api tool
SpecForge: Accelerating Speculative Decoding Training for SGLang

SpecForge: Accelerating Speculative Decoding Training for SGLang

<p>Speculative decoding is a powerful technique for accelerating Large Language Model (LLM) inference. In this blog post, we are excited to announce the open...

LMSYS Blog
framework tool
Using GitHub Spark to reverse engineer GitHub Spark

Using GitHub Spark to reverse engineer GitHub Spark

GitHub Spark was released in public preview yesterday. It’s GitHub’s implementation of the prompt-to-app pattern also seen in products like Claude Artifacts, Lovable, Vercel v0, Val Town Townie and Fly.io’s …

Simon Willison's Blog
api framework tool
No Image

Quoting Recurse Center

[...] You learn best and most effectively when you are learning something that you care about. Your work becomes meaningful and something you can be proud of only when you …

Simon Willison's Blog
platform
Instagram Reel: Veo 3 paid preview

Instagram Reel: Veo 3 paid preview

@googlefordevs on Instagram published this reel featuring Christina Warren with prompting tips for the new Veo 3 paid preview (mp4 copy here). (Christine checked first if I minded them using …

Simon Willison's Blog
tool
TimeScope: How Long Can Your Video Large Multimodal Model Go?

TimeScope: How Long Can Your Video Large Multimodal Model Go?

New open source benchmark for evaluating vision LLMs on how well they handle long videos: TimeScope probes the limits of long-video capabilities by inserting several short (~5-10 second) video clips---our …

Simon Willison's Blog
api tool
1KB JS Numbers Station

1KB JS Numbers Station

Terence Eden built a neat and weird 1023 byte JavaScript demo that simulates a numbers station using the browser SpeechSynthesisUtterance, which I hadn't realized is supported by every modern browser …

Simon Willison's Blog
api tool
No Image

Quoting Dave White

like, one day you discover you can talk to dogs. it's fun and interesting so you do it more, learning the intricacies of their language and their deepest customs. you …

Simon Willison's Blog
platform
No Image

Quoting ICML 2025

Submitting a paper with a "hidden" prompt is scientific misconduct if that prompt is intended to obtain a favorable review from an LLM. The inclusion of such a prompt is …

Simon Willison's Blog
platform
Qwen3-Coder: Agentic Coding in the World

Qwen3-Coder: Agentic Coding in the World

It turns out that as I was typing up my notes on Qwen3-235B-A22B-Instruct-2507 the Qwen team were unleashing something much bigger: Today, we’re announcing Qwen3-Coder, our most agentic code model …

Simon Willison's Blog
api cloud tool
Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen/Qwen3-235B-A22B-Instruct-2507

Significant new model release from Qwen, published yesterday without much fanfare. This is a follow-up to their April release of the full Qwen 3 model family, which included a Qwen3-235B-A22B …

Simon Willison's Blog
platform
Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

This new alignment paper from Anthropic wins my prize for best illustrative figure so far this year: The researchers found that fine-tuning a model on data generated by another model …

Simon Willison's Blog
platform
Our contribution to a global environmental standard for AI

Our contribution to a global environmental standard for AI

Mistral have released environmental impact numbers for their largest model, Mistral Large 2, in more detail than I have seen from any of the other large AI labs. The methodology …

Simon Willison's Blog
platform
No Image

Gemini 2.5 Flash-Lite is now stable and generally available

The last remaining member of the Gemini 2.5 trio joins Pro and Flash in General Availability today. Gemini 2.5 Flash-Lite is the cheapest of the 2.5 family, at $0.10/million input …

Simon Willison's Blog
api tool
AIコーディングハンズオンの講師をやりました(株式会社DeNA様の事例)

AIコーディングハンズオンの講師をやりました(株式会社DeNA様の事例)

Zenn mizchi
framework tool
No Image

Textual v4.0.0: The Streaming Release

Will McGugan may no longer be running a commercial company around Textual, but that hasn't stopped his progress on the open source project. He recently released v4 of his Python …

Simon Willison's Blog
api library tool
No Image

tidwall/pogocache

New project from Josh Baker, author of the excellent tg C geospatial libarry (covered previously) and various other interesting projects: Pogocache is fast caching software built from scratch with a …

Simon Willison's Blog
platform
No Image

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

OpenAI beat them to the punch in terms of publicity by publishing their results on Saturday, but a team from Google Gemini achieved an equally impressive result on this year's …

Simon Willison's Blog
platform
No Image

Quoting Daniel Litt

An AI tool that gets gold on the IMO is obviously immensely impressive. Does it mean math is “solved”? Is an AI-generated proof of the Riemann hypothesis clearly on the …

Simon Willison's Blog
platform
The top 15 MCP servers for your AI projects

The top 15 MCP servers for your AI projects

Explore 15 essential MCP servers for web developers to enhance AI workflows with tools, data, and automation.

logrocket-dev
api cloud tool
No Image

Coding with LLMs in the summer of 2025 (an update)

Salvatore Sanfilippo describes his current AI-assisted development workflow. He's all-in on LLMs for code review, exploratory prototyping, pair-design and writing "part of the code under your clear specifications", but warns …

Simon Willison's Blog
api tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (July 14 - 20)

Elvis Saravia's NLP Blog
platform
No Image

Quoting Armin Ronacher

Every day someone becomes a programmer because they figured out how to make ChatGPT build something. Lucky for us: in many of those cases the AI picks Python. We should …

Simon Willison's Blog
library tool
No Image

Quoting Tim Sweeney

There’s a bigger opportunity in computer science and programming (academically conveyed or self-taught) now than ever before, by far, in my opinion. The move to AI is like replacing shovels …

Simon Willison's Blog
platform
Deploying Kimi K2 with PD Disaggregation and Large-Scale Expert Parallelism on 128 H200 GPUs

Deploying Kimi K2 with PD Disaggregation and Large-Scale Expert Parallelism on 128 H200 GPUs

<h2><a id="1️⃣-introduction-deploying-the-most-advanced-open-source-moe-model" class="anchor" href="#1️⃣-introduction-deploying-the-most-advanced-open-source...

LMSYS Blog
framework tool
No Image

OpenAI's gold medal performance on the International Math Olympiad

OpenAI research scientist Alexander Wei: I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s …

Simon Willison's Blog
platform
🤖 AI Agents Weekly: ChatGPT Agent, Gemini Embeddings, Agent Leaderboard v2, Voxtral, CRMAgent

🤖 AI Agents Weekly: ChatGPT Agent, Gemini Embeddings, Agent Leaderboard v2, Voxtral, CRMAgent

ChatGPT Agent, Gemini Embeddings, Agent Leaderboard v2, Voxtral, CRMAgent

Elvis Saravia's NLP Blog
api tool
Kiroとコンテキストエンジニアリングの時流

Kiroとコンテキストエンジニアリングの時流

Kiro(kiro.dev)は、AWSが開発したIDE型のコーディングエージェントです。CursorやWindsurfのようなVS Codeフォークエディタに分類されます。現在はパブリックプレビュー中で、サインアップするとKiroでClaude Sonnet 3.7 とClaude 4 Sonnetを利用できます。 KiroThe AI IDE for prototype to productionKiro Kiroの特徴は、スペック駆動開発、エージェントフック、ステアリングファイルといった独自の機能を通じて、ソフトウェア開発のライフサイクル全体を支援します。それぞれ見ていきましょう。 スペック (Specs)駆動開発 Kiroの中核をなすのが「スペック=仕様書」機能です。これは、ユーザーが入力した大まかな指示(例:「ユーザー認証機能を追加して」)をもとに、AIが「要件定義」「設計」「タスクリスト」という3段階のドキュメントを自動で生成するものです。 Markdownファイルが.kiro/specs/${task}/配下にタスク単位で生成されます。

Lai.so Blog
framework tool
No Image

New tags

A few months I added a tool to my blog for bulk-applying tags to old content. It works as an extension to my existing search interface, letting me run searches …

Simon Willison's Blog
tool
No Image

Quoting Steve Yegge

So one of my favorite things to do is give my coding agents more and more permissions and freedom, just to see how far I can push their productivity without …

Simon Willison's Blog
tool
No Image

Quoting Paul Kedrosky

One analyst recently speculated (via Ed Conard) that, based on Nvidia's latest datacenter sales figures, AI capex may be ~2% of US GDP in 2025, given a standard multiplier. [...] …

Simon Willison's Blog
cloud infra
No Image

How to run an LLM on your laptop

I talked to Grace Huckins for this piece from MIT Technology Review on running local models. Apparently she enjoyed my dystopian backup plan! Simon Willison has a plan for the …

Simon Willison's Blog
tool
ChatGPT agent の発表まとめ

ChatGPT agent の発表まとめ

Zenn schroneko
api tool
How to build better AI apps in React with MediaPipe’s latest APIs

How to build better AI apps in React with MediaPipe’s latest APIs

Build an AI-powered object detection app in React using MediaPipe's latest Tasks API, run models in-browser with no backend setup.

logrocket-dev
api tool
AI won’t fix bad thinking — use it to challenge you instead

AI won’t fix bad thinking — use it to challenge you instead

AI agrees too easily. That’s a problem. Learn how to prompt it to challenge your thinking and improve your product decisions.

logrocket-dev
tool
Accelerating SGLang with Multiple Token Prediction

Accelerating SGLang with Multiple Token Prediction

<h2><a id="tldr" class="anchor" href="#tldr" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 1...

LMSYS Blog
library tool
No Image

Voxtral

Mistral released their first audio-input models yesterday: Voxtral Small and Voxtral Mini. These state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B …

Simon Willison's Blog
api tool
No Image

common-pile/caselaw_access_project

Enormous openly licensed (I believe this is almost all public domain) training dataset of US legal cases: This dataset contains 6.7 million cases from the Caselaw Access Project and Court …

Simon Willison's Blog
api cloud tool
How to build unified AI interfaces using the Vercel AI SDK

How to build unified AI interfaces using the Vercel AI SDK

Learn how to use the Vercel AI SDK to build modern, multimodal frontend apps with streaming, function calling, image analysis, voice output, and generative UI.

logrocket-dev
library tool
How to support new VLMs into SGLang: A Case Study with NVILA

How to support new VLMs into SGLang: A Case Study with NVILA

<p>The world of LLMs is evolving at a remarkable pace, with Visual Language Models (VLMs) at the forefront of this revolution. These models power application...

LMSYS Blog
api cloud tool
No Image

Reflections on OpenAI

Calvin French-Owen spent just over a year working at OpenAI, during which time the organization grew from 1,000 to 3,000 people and Calvin found himself in "the top 30% by …

Simon Willison's Blog
api library tool
No Image

xAI: "We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated"

They continue: One was that if you ask it "What is your surname?" it doesn't have one so it searches the internet leading to undesirable results, such as when its …

Simon Willison's Blog
tool
AI compliance: A core product competency you shouldn’t skip

AI compliance: A core product competency you shouldn’t skip

AI governance is now a product feature. Learn how to embed trust, transparency, and compliance into your build cycles.

logrocket-dev
api tool
Access State-of-the-Art LLM models at cost via OpenHands GUI and CLI

Access State-of-the-Art LLM models at cost via OpenHands GUI and CLI

Announcing a new OpenHands LLM provider that enables access to state-of-the-art (SOTA) agentic coding models like Claude Sonnet 4, Devstral Small, Devstral Medium, Gemini 2.5 Pro, and o4-mini at cost – without any additional pricing markup.

All Hands Blog
ai api platform
AWS の エージェント IDE Kiro を使ってみた

AWS の エージェント IDE Kiro を使ってみた

Kiro は AWS が開発した IDE 内蔵型の AI コーディングエージェントです。Kiro の特徴は単なるバイブコーディングにとどまらず、スペックを使用して仕様駆動開発でアプリケーションを開発できることです。この記事では Kiro を使ったアプリケーション開発の流れを紹介します。

azukiazusa のテックブログ2
library tool
Application development without programmers

Application development without programmers

This book by James Martin, published in 1982, includes the following in the preface: Applications development did not change much for 20 years, but now a new wave is crashing …

Simon Willison's Blog
api framework tool
No Image

ccusage

Claude Code logs detailed usage information to the ~/.claude/ directory. ccusage is a neat little Node.js tool which reads that information and shows you a readable summary of your usage …

Simon Willison's Blog
tool
Cost Effective Deployment of DeepSeek R1 with Intel® Xeon® 6 CPU on SGLang

Cost Effective Deployment of DeepSeek R1 with Intel® Xeon® 6 CPU on SGLang

<p>The impressive performance of DeepSeek R1 marked a rise of giant Mixture of Experts (MoE) models in Large Language Models (LLM). However, its massive mode...

LMSYS Blog
library tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (July 7 - 13)

Elvis Saravia's NLP Blog
platform
サンドボックス環境を MCP サーバーで提供する Container Use

サンドボックス環境を MCP サーバーで提供する Container Use

AI コーディングエージェントは便利ですが、任意の Bash コマンドを実行できるため、ユーザーのシステムに影響を与える可能性があります。Container Use は MCP サーバーとして動作し、AI コーディングエージェントにサンドボックス環境を提供します。この記事では Container Use の利用方法について紹介します。

azukiazusa のテックブログ2
api tool
🤖 AI Agents Weekly: Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

🤖 AI Agents Weekly: Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Grok 4, Context Engineering Guide, Kimi K2, SmolLM3, MedGemma 27B, AI SDK 5

Elvis Saravia's NLP Blog
platform
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

METR - for Model Evaluation & Threat Research - are a non-profit research institute founded by Beth Barnes, a former alignment researcher at OpenAI (see Wikipedia). They've previously contributed to …

Simon Willison's Blog
tool
Grok 4 Heavy won't reveal its system prompt

Grok 4 Heavy won't reveal its system prompt

Grok 4 Heavy is the "think much harder" version of Grok 4 that's currenly only available on their $300/month plan. Jeremy Howard relays a report from a Grok 4 Heavy …

Simon Willison's Blog
platform
No Image

Quoting @grok

On the morning of July 8, 2025, we observed undesired responses and immediately began investigating. To identify the specific language in the instructions causing the undesired behavior, we conducted multiple …

Simon Willison's Blog
platform
No Image

Musk’s latest Grok chatbot searches for billionaire mogul’s views before answering questions

I got quoted a couple of times in this story about Grok searching for tweets from:elonmusk by Matt O’Brien for the Associated Press. “It’s extraordinary,” said Simon Willison, an independent …

Simon Willison's Blog
tool
moonshotai/Kimi-K2-Instruct

moonshotai/Kimi-K2-Instruct

Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the …

Simon Willison's Blog
tool
No Image

Quoting Django’s security policies

Following the widespread availability of large language models (LLMs), the Django Security Team has received a growing number of security reports generated partially or entirely using such tools. Many of …

Simon Willison's Blog
security
GitHub Copilot NESの内部実装が公開、そして続・AIエディタ戦争

GitHub Copilot NESの内部実装が公開、そして続・AIエディタ戦争

Copilot NESとは Copilot NES(Next Edit Suggestions)は2025年2月にリリースされたGitHub Copilotの内部機能です。コードの変更に連動して必要となる次の編集を予測し、タブキーを押しているだけで複数箇所にわたる修正を提案してくれます。通常のコード補完がカーソル位置の続きのコードを予測するのに対して、Copilot NESは「エディタ上の編集操作」の単位で続きを予測して補完します。 GitHub Next | Copilot Next Edit SuggestionsGitHub Next Project: Can we improve Copilot code completion by suggesting the next logical change, wherever it is in your project?GitHub Next この仕組みはCopilot NESの元ネタであるCursor Tab(Copilot++)によって実用化されましたが、Cursorはプロプライエタリなソフトウェアなので内部の詳細が分かり

Lai.so Blog
library tool
No Image

Generationship: Ep. #39, Simon Willison

I recorded this podcast episode with Rachel Chalmers a few weeks ago. We talked about the resurgence of blogging, the legacy of Google Reader, learning in public, LLMs as weirdly …

Simon Willison's Blog
podcast
Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"

Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk’s stance before providing you with an answer. …

Simon Willison's Blog
platform
Grok 4

Grok 4

Released last night, Grok 4 is now available via both API and a paid subscription for end-users. Key characteristics: image and text input, text output. 256,000 context length (twice that …

Simon Willison's Blog
api tool
Gemini CLI tutorial — Will it replace Windsurf and Cursor?

Gemini CLI tutorial — Will it replace Windsurf and Cursor?

Discover how to use Gemini CLI, Google's new open-source AI agent that brings Gemini directly to your terminal.

logrocket-dev
api tool
Grok 4がリリース

Grok 4がリリース

xAIのGrok 4が公開されました。 Introducing Grok 4, the world's most powerful AI model. Watch the livestream now: https://t.co/59iDX5s2ck — xAI (@xai) July 10, 2025 モデルカード コンテキストウィンドウは256,000トークンです。Claude 4 Sonnetが200,000トークン。 Models / Grok 4 「Grok 4 Code」って何なの コーディングモデルの名前です。Claude Code的なCLIではなさそうです。OpenAIでいうCodex(モデルの方)になります。Redditのスレによると「Cursorで使える」というメッセージがコンソールにでていたらしいです。 Grok 4 by

Lai.so Blog
api tool
Stress-testing AI products: A red-teaming playbook

Stress-testing AI products: A red-teaming playbook

Red-teaming reveals how AI fails at scale. Learn to embed adversarial testing into your sprints before your product becomes a headline.

logrocket-dev
api tool
Grok 4 の発表まとめ&試してみた

Grok 4 の発表まとめ&試してみた

Zenn schroneko
api tool
Leader Spotlight: Building a human-focused AI product, with Cory Bishop

Leader Spotlight: Building a human-focused AI product, with Cory Bishop

Cory Bishop talks about the role of human-centered design and empathy in Bubble’s no-code AI development product.

logrocket-dev
tool
Infinite Monkey

Infinite Monkey

Mihai Parparita's Infinite Mac lets you run classic MacOS emulators directly in your browser. Infinite Monkey is a new feature which taps into the OpenAI Computer Use and Claude Computer …

Simon Willison's Blog
tool
Devin vs Cursor Background Agents: 完全自律型AIエージェントの性能比較

Devin vs Cursor Background Agents: 完全自律型AIエージェントの性能比較

はじめに Cursor のBackground Agentsが GA になったので「Devinとどの程度たたかえるのか?」という疑問が湧いてきました。そこでTypeScriptのクイズ101問をすべて解くというタスクでDevinと戦ってもらいます。ここにスーパーサブのClaude Code Actionさんも参加してもらって三つ巴にします。チャンピオンを決めようや・・・ お題はexercism/typescriptのリポジトリを筆者がエージェントタスク向けにフォークしたものを使います。Exercismはプログラミング学習サイトで、GitHubで公開している問題集とテストコードはAider PolyglotやRoo Codeなど実際のエージェント製品のベンチマークで使用されており、エージェント同士の比較に適しています。 GitHub - laiso/exercism-typescript: Exercism exercises in TypeScript.Exercism exercises in TypeScript. Contribute to laiso/exercism-t

Lai.so Blog
api tool
slime: An SGLang-Native Post-Training Framework for RL Scaling

slime: An SGLang-Native Post-Training Framework for RL Scaling

<h2><a id="vision-that-drives-slime" class="anchor" href="#vision-that-drives-slime" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" ...

LMSYS Blog
framework tool
OME: Revolutionizing LLM Infrastructure with Model-Driven Architecture

OME: Revolutionizing LLM Infrastructure with Model-Driven Architecture

<h2><a id="the-tale-of-two-teams-why-model-serving-is-broken" class="anchor" href="#the-tale-of-two-teams-why-model-serving-is-broken" aria-hidden="true"><sv...

LMSYS Blog
cloud platform tool
Cursorの価格設定変更の騒動について

Cursorの価格設定変更の騒動について

2024年6月にCursorは価格体系を大幅に変更し、月額20ドルのProプランを「リクエスト数制限」から「トークン使用量制限」へと切り替え、さらに月額200ドルのUltraプランを新設しました。 Updates to Ultra and Pro | Cursor - The AI Code EditorIn collaboration with the model providers, we’re introducing a $200 / mo tier for power users.Cursor Cursorの説明によると、以前は月500リクエストまでの制限で、リクエストごとのトークン使用量は考慮されていませんでした。新しい料金モデルは1回のリクエストで消費するトークン数が大幅に異なるため、単純なリクエスト数制限ではコストを正確に反映できなくなりました。そのため、CursorはAPIベースのトークン使用量課金に移行し、Proプランには月20ドル分のトークンクレジットを含み、それを超えた分は追加課金となる形にしました。 まずいことにCursorはこの変更をポジティブに伝えるた

Lai.so Blog
tool
Quoting Aphyr

Quoting Aphyr

I strongly suspect that Market Research Future, or a subcontractor, is conducting an automated spam campaign which uses a Large Language Model to evaluate a Mastodon instance, submit a plausible …

Simon Willison's Blog
platform
Become a command-line superhero with Simon Willison's llm tool

Become a command-line superhero with Simon Willison's llm tool

Christopher Smith ran a mini hackathon in Albany New York at the weekend around uses of my LLM - the first in-person event I'm aware of dedicated to that project! …

Simon Willison's Blog
api tool
The Best AI Coding Tools in 2025

The Best AI Coding Tools in 2025

Discover the best AI tools for coding in 2025 and transform how you build with these powerful coding assistants.

Builder.io Blog
api library tool
Adding a feature because ChatGPT incorrectly thinks it exists

Adding a feature because ChatGPT incorrectly thinks it exists

Adrian Holovaty describes how his SoundSlice service saw an uptick in users attempting to use their sheet music scanner to import ASCII-art guitar tab... because it turned out ChatGPT had …

Simon Willison's Blog
api tool
I Shipped a macOS App Built Entirely by Claude Code

I Shipped a macOS App Built Entirely by Claude Code

Indragie Karunaratne has "been building software for the Mac since 2008", but recently decided to try Claude Code to build a side project: Context, a native Mac app for debugging …

Simon Willison's Blog
library tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (June 30 - July 6)

Elvis Saravia's NLP Blog
platform
Quoting Nineteen Eighty-Four

Quoting Nineteen Eighty-Four

There was a whole chain of separate departments dealing with proletarian literature, music, drama, and entertainment generally. Here were produced rubbishy newspapers containing almost nothing except sport, crime and astrology, …

Simon Willison's Blog
platform
No Image

Supabase MCP can leak your entire SQL database

Here's yet another example of a lethal trifecta attack, where an LLM system combines access to private data, exposure to potentially malicious instructions and a mechanism to communicate data back …

Simon Willison's Blog
database security
Context Engineering Guide

Context Engineering Guide

Prompt engineering is being rebranded as context engineering

Elvis Saravia's NLP Blog
api tool
🤖 AI Agents Weekly: DeepSWE, Cursor 1.2, Evaluating Multi-Agent Systems, Prover Agent, Top AI Devs News

🤖 AI Agents Weekly: DeepSWE, Cursor 1.2, Evaluating Multi-Agent Systems, Prover Agent, Top AI Devs News

DeepSWE, Cursor 1.2, Evaluating Multi-Agent Systems, Prover Agent, Top AI Devs News

Elvis Saravia's NLP Blog
library tool
No Image

Cursor: Clarifying Our Pricing

Cursor changed their pricing plan on June 16th, introducing a new $200/month Ultra plan with "20x more usage than Pro" and switching their $20/month Pro plan from "request limits to …

Simon Willison's Blog
api tool
No Image

Identify, solve, verify

The more time I spend using LLMs for code, the less I worry for my career - even as their coding capabilities continue to improve. Using LLMs as part of …

Simon Willison's Blog
platform
No Image

awwaiid/gremllm

Delightfully cursed Python library by Brock Wilcox, built on top of LLM: from gremllm import Gremllm counter = Gremllm("counter") counter.value = 5 counter.increment() print(counter.value) # 6? print(counter.to_roman_numerals()) # VI? You …

Simon Willison's Blog
library tool
How to build a web-based AI agent with Stagehand and Gemini

How to build a web-based AI agent with Stagehand and Gemini

Learn how to build a browser-based AI agent with Stagehand and Gemini to automate tasks like navigation, extraction, and interaction using natural language.

logrocket-dev
api tool
No Image

Quoting Adam Gordon Bell

I think that a lot of resistance to AI coding tools comes from the same place: fear of losing something that has defined you for so long. People are reacting …

Simon Willison's Blog
platform
No Image

Frequently Asked Questions (And Answers) About AI Evals

Hamel Husain and Shreya Shankar have been running a paid, cohort-based course on AI Evals For Engineers & PMs over the past few months. Here Hamel collects answers to the …

Simon Willison's Blog
platform
No Image

Trial Court Decides Case Based On AI-Hallucinated Caselaw

Joe Patrice writing for Above the Law: [...] it was always only a matter of time before a poor litigant representing themselves fails to know enough to sniff out and …

Simon Willison's Blog
platform
No Image

Sandboxed tools in a loop

Something I've realized about LLM tool use is that it means that if you can reduce a problem to something that can be solved by an LLM in a sandbox …

Simon Willison's Blog
tool
Getting started with Claude 4 API: A developer’s walkthrough

Getting started with Claude 4 API: A developer’s walkthrough

This guide explores how to use Anthropic's Claude 4 models, including Opus 4 and Sonnet 4, to build AI-powered applications.

logrocket-dev
api tool
No Image

Table saws

Quitting programming as a career right now because of LLMs would be like quitting carpentry as a career thanks to the invention of the table saw.

Simon Willison's Blog
platform
No Image

Quoting Charles Babbage

On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?" In one case a …

Simon Willison's Blog
platform
t-wada vs テスト大好郎

t-wada vs テスト大好郎

先日一部のClaude Codeユーザーの間で「プロンプトに”t-wadaさんの推奨する進め方に従ってください”と書くとテスト駆動開発のプラクティスを実践してくれる」というTIPSが話題になっていました。 なるほど、TDDやテスト駆動開発という言葉は広まりすぎて「意味の希薄化」が発生し、曖昧な理解のまま自動テストやテストファーストと混同され、それがLLMの学習データにも影響したが、人名を与えるとLLMに「具体的な参照点」を与え、より具体的なプログラミングスタイルに限定させる効果があったのか pic.twitter.com/p6SCPj8YdA — Takuto Wada (@t_wada) June 25, 2025 これは確かに面白い現象で、現にClaudeに直接質問するとt-wadaさんの知識を持っていることがわかります。そこから連想してClaude CodeがTDDをするトリガーとして使えるのなら面白いなと思い色々試してみました。 (ところでこの翌日、最近バイブコーディングにはまってSmalltalkのライブラリをLLMで書いているKent Beckも自著のタイトルを

Lai.so Blog
api testing tool
AI dev tool power rankings & comparison [July 2025 edition]

AI dev tool power rankings & comparison [July 2025 edition]

Which AI frontend dev tool reigns supreme in July 2025? Check out our power rankings and use our interactive comparison tool to find out.

logrocket-dev
tool
AI Agentが回答に困った時にSlackで人間に助言を求められるMCPを検証した

AI Agentが回答に困った時にSlackで人間に助言を求められるMCPを検証した

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
api tool
Mandelbrot in x86 assembly by Claude

Mandelbrot in x86 assembly by Claude

Inspired by a tweet asking if Claude knew x86 assembly, I decided to run a bit of an experiment. I prompted Claude Sonnet 4: Write me an ascii art mandelbrot …

Simon Willison's Blog
library tool
No Image

TIL: Using Playwright MCP with Claude Code

Inspired by Armin ("I personally use only one MCP - I only use Playwright") I decided to figure out how to use the official Playwright MCP server with Claude Code. …

Simon Willison's Blog
api tool
No Image

Quoting Kevin Webb

One of the best examples of LLM developer tooling I've heard is from a team that supports software from the 80s-90s. Their only source of documentation is video interviews with …

Simon Willison's Blog
tool
No Image

A custom template system from the mid-2000s era

Using LLMs for code archaeology is pretty fun. I stumbled across this blog entry from 2003 today, in which I had gotten briefly excited about ColdFusion and implemented an experimental …

Simon Willison's Blog
library tool
Potemkin Understanding in LLMs: New Study Reveals Flaws in AI Benchmarks

Potemkin Understanding in LLMs: New Study Reveals Flaws in AI Benchmarks

New research reveals that LLMs often fake understanding, passing benchmarks but failing to apply concepts or stay internally consistent.

Socket
platform
No Image

Quoting mrmincent

To misuse a woodworking metaphor, I think we’re experiencing a shift from hand tools to power tools. You still need someone who understands the basics to get the good results …

Simon Willison's Blog
platform
No Image

June newsletter for sponsors has been sent

I just sent out the second edition of my sponsors only monthly newsletter. Anyone who sponsors me for $10/month or more on GitHub gets this carefully hand-curated summary of the …

Simon Willison's Blog
tool
No Image

Using Claude Code to build a GitHub Actions workflow

I wanted to add a small feature to one of my GitHub repos - an automatically updated README index listing other files in the repo - so I decided to …

Simon Willison's Blog
tool
No Image

llvm: InstCombine: improve optimizations for ceiling division with no overflow - a PR by Alex Gaynor and Claude Code

Alex Gaynor maintains rust-asn1, and recently spotted a missing LLVM compiler optimization while hacking on it, with the assistance of Claude (Alex works for Anthropic). He describes how he confirmed …

Simon Willison's Blog
library tool
Leader Spotlight: Adopting and championing responsible AI, with Asma Syeda

Leader Spotlight: Adopting and championing responsible AI, with Asma Syeda

Asma Syeda shares the importance of responsible AI and best practices for companies to ensure their AI technology remains ethical.

logrocket-dev
platform tool
Agentic Coding: The Future of Software Development with Agents

Agentic Coding: The Future of Software Development with Agents

Armin Ronacher delivers a 37 minute YouTube talk describing his adventures so far with Claude Code and agentic coding methods. A friend called Claude Code catnip for programmers and it …

Simon Willison's Blog
api cloud tool
No Image

How to Fix Your Context

Drew Breunig has been publishing some very detailed notes on context engineering recently. In How Long Contexts Fail he described four common patterns for context rot, which he summarizes like …

Simon Willison's Blog
tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (June 23 - 29)

Elvis Saravia's NLP Blog
platform
Claude CodeのTaskツールの並列実行(parallelTasksCount)は分析タスク向け

Claude CodeのTaskツールの並列実行(parallelTasksCount)は分析タスク向け

Claude CodeのTaskツールは派生元となる親エージェントの処理から子エージェントがメッセージAPI呼び出しを非同期で実行しているが、この時の子の数がparallelTasksCountの設定値になる。デフォルトでは「1」に設定されている。 これを上書きするコマンドは以下になる。設定値を上げるとトークン消費量が増加するので注意してほしい。 claude config set -g parallelTasksCount 2 parallelTasksCountはTaskツール実行時の動作を変える。簡単なテスト方法はClaude CodeにTaskツールを使ってくれと直接頼むことだ。parallelTasksCountの数だけ「Initializing N parallel agents…」がコンソールに出力される。 Tyler Burnamのポストではこの並列数がタスク完了速度に寄与するという説明をしているが、筆者が調べたところによるとそれは正確でなかった。 Taskツールの並列実行は親となるエージェント・内部的にはSynthesis Agentと呼ばれる、が子に対して

Lai.so Blog
api tool
🤖 AI Agents Weekly: Gemini CLI, Qodo Gen CLI, Context Engineering, Claude Apps, AlphaGenome

🤖 AI Agents Weekly: Gemini CLI, Qodo Gen CLI, Context Engineering, Claude Apps, AlphaGenome

Gemini CLI, Qodo Gen CLI, Context Engineering, Claude Apps, AlphaGenome

Elvis Saravia's NLP Blog
tool
MCP の Structured tool output を試してみる

MCP の Structured tool output を試してみる

MCP の 2025-06-18 バージョンでは Structured tool output がサポートされました。ツールの定義で `outputSchema` を出力のスキーマを定義し、`structuredContent` フィールドに構造化された出力を返すことができます。この記事では MCP の TypeScript SDK を使用して Structured tool output を試してみます。

azukiazusa のテックブログ2
api tool
No Image

Context engineering

The term context engineering has recently started to gain traction as a better alternative to prompt engineering. I like it. I think this one may have sticking power. Here's an …

Simon Willison's Blog
platform
No Image

Continuous AI

GitHub Next have coined the term "Continuous AI" to describe "all uses of automated AI to support software collaboration on any platform". It's intended as an echo of Continuous Integration …

Simon Willison's Blog
api cloud tool
Project Vend: Can Claude run a small shop? (And why does that matter?)

Project Vend: Can Claude run a small shop? (And why does that matter?)

In "what could possibly go wrong?" news, Anthropic and Andon Labs wired Claude 3.7 Sonnet up to a small vending machine in the Anthropic office, named it Claudius and told …

Simon Willison's Blog
tool
【今週の話題】Gemini CLIがリリース

【今週の話題】Gemini CLIがリリース

かねてから噂されていたGoogleのGemini公式のCLI型コーディングエージェント「Gemini CLI」がリリースされました。Gemini CLIはClaude Codeのようにターミナル(CLI)から使えるツールです。モデルは標準でGemini 2.5 Proが無料で使え、WindowsでもWSLなしに動作します。 GitHub - google-gemini/gemini-cli: An open-source AI agent that brings the power of Gemini directly into your terminal.An open-source AI agent that brings the power of Gemini directly into your terminal. - google-gemini/gemini-cliGitHubgoogle-gemini Gemini

Lai.so Blog
api tool
Introducing Gemma 3n: The developer guide

Introducing Gemma 3n: The developer guide

Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered …

Simon Willison's Blog
tool
Geminiception

Geminiception

Yesterday Anthropic got a bunch of buzz out of their new window.claude.complete() API which allows Claude Artifacts to run their own API calls. It turns out Gemini had beaten them …

Simon Willison's Blog
api tool
No Image

New sandboxes from Cloudflare and Vercel

Two interesting new products for running code in a sandbox today. Cloudflare launched their Containers product in open beta, and added a new Sandbox library for Cloudflare Workers that can …

Simon Willison's Blog
tool
Build and share AI-powered apps with Claude

Build and share AI-powered apps with Claude

Anthropic have added one of the most important missing features to Claude Artifacts: apps built as artifacts now have the ability to run their own prompts against Claude via a …

Simon Willison's Blog
api tool
No Image

Quoting Christoph Niemann

Creating art is a nonlinear process. I start with a rough goal. But then I head into dead ends and get lost or stuck. The secret to my process is …

Simon Willison's Blog
platform
Gemini CLI

Gemini CLI

First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own …

Simon Willison's Blog
api tool
Gemini CLI の簡単チュートリアル

Gemini CLI の簡単チュートリアル

Zenn schroneko
api tool
gemini-cli の google_web_search が最高

gemini-cli の google_web_search が最高

Zenn mizchi
api tool
Your AI has agency — here’s how to architect its frontend

Your AI has agency — here’s how to architect its frontend

Explore how to create UI frameworks that visualize and manage intelligent AI agents with agency and real-time feedback.

logrocket-dev
api cloud tool
No Image

Anthropic wins a major fair use victory for AI — but it’s still in trouble for stealing books

Major USA legal news for the AI industry today. Judge William Alsup released a "summary judgement" (a legal decision that results in some parts of a case skipping a trial) …

Simon Willison's Blog
api cloud tool
How to design apps with Apple Intelligence in mind

How to design apps with Apple Intelligence in mind

Explore the core features of Apple Intelligence, and consider do's and don'ts for designing with Apple Intelligence in mind.

logrocket-dev
tool ui
Phoenix.new is Fly's entry into the prompt-driven app development space

Phoenix.new is Fly's entry into the prompt-driven app development space

Here’s a fascinating new entrant into the AI-assisted-programming / coding-agents space by Fly.io, introduced on their blog in Phoenix.new – The Remote AI Runtime for Phoenix: describe an app in …

Simon Willison's Blog
framework tool
No Image

Disclosures

I've added a Disclosures section to my about page, listing my various sources of income and the companies that directly sponsor my work or have supported it in the recent …

Simon Willison's Blog
tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (June 16 - 22)

Elvis Saravia's NLP Blog
platform
No Image

Quoting Kent Beck

So you you can think really big thoughts and the leverage of having those big thoughts has just suddenly expanded enormously. I had this tweet two years ago where I …

Simon Willison's Blog
tool
My First Open Source AI Generated Library

My First Open Source AI Generated Library

Armin Ronacher had Claude and Claude Code do almost all of the work in building, testing, packaging and publishing a new Python library based on his design: It wrote ~1100 …

Simon Willison's Blog
library
No Image

model.yaml

From their GitHub repo it looks like this effort quietly launched a couple of months ago, driven by the LM Studio team. Their goal is to specify an

Simon Willison's Blog
library
🤖 AI Agents Weekly: Software 3.0, Gemini 2.5 Updates, Safer AI Agents, Deep Research Tutorial & Benchmark

🤖 AI Agents Weekly: Software 3.0, Gemini 2.5 Updates, Safer AI Agents, Deep Research Tutorial & Benchmark

Software 3.0, Gemini 2.5 Updates, Safer AI Agents, Deep Research Tutorial & Benchmark

Elvis Saravia's NLP Blog
platform
ブラウザから MCP サーバーに接続する use-mcp React フック

ブラウザから MCP サーバーに接続する use-mcp React フック

use-mcp はリモートの MCP サーバーに接続するための React フックです。ツールの呼び出しや認証を簡単に行うことができます。この記事では、use-mcp を使用して MCP サーバーに接続し、ツールを呼び出す方法と、OAuth 認証の実装方法について解説します。

azukiazusa のテックブログ2
api tool
No Image

Quoting FAQ for Your Brain on ChatGPT

Is it safe to say that LLMs are, in essence, making us "dumber"? No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", and so on. …

Simon Willison's Blog
platform
AbsenceBench: Language Models Can't Tell What's Missing

AbsenceBench: Language Models Can't Tell What's Missing

Here's another interesting result to file under the

Simon Willison's Blog
platform
Magenta RealTime: An Open-Weights Live Music Model

Magenta RealTime: An Open-Weights Live Music Model

Fun new

Simon Willison's Blog
tool
Agentic Misalignment: How LLMs could be insider threats

Agentic Misalignment: How LLMs could be insider threats

One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be taken …

Simon Willison's Blog
platform
python-importtime-graph

python-importtime-graph

I was exploring why a Python tool was taking over a second to start running and I learned about the python -X importtime feature, documented here. Adding that option causes …

Simon Willison's Blog
library tool
No Image

Mistral-Small 3.2

Released on Hugging Face a couple of hours ago, so far there aren't any quantizations to run it on a Mac but I'm sure those will emerge pretty quickly. This …

Simon Willison's Blog
platform
AI による自然言語アサーション

AI による自然言語アサーション

Zenn mizchi
api tool
No Image

Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk

Stop me if you've heard this one before: A threat actor (acting as an external user) submits a malicious support ticket. An internal user, linked to a tenant, invokes an …

Simon Willison's Blog
api security