Introducing GPT-5.1 for developers

Introducing GPT-5.1 for developers

OpenAI announced GPT-5.1 yesterday, calling it a smarter, more conversational ChatGPT. Today they've added it to their API. We actually got four new models today: gpt-5.1 gpt-5.1-chat-latest gpt-5.1-codex gpt-5.1-codex-mini There …

Simon Willison's Blog
api tool
Nano Banana can be prompt engineered for extremely nuanced AI image generation

Nano Banana can be prompt engineered for extremely nuanced AI image generation

Max Woolf provides an exceptional deep dive into Google's Nano Banana aka Gemini 2.5 Flash Image model, still the best available image manipulation LLM tool three months after its initial …

Simon Willison's Blog
tool
No Image

Quoting Nov 12th letter from OpenAI to Judge Ona T. Wang

On Monday, this Court entered an order requiring OpenAI to hand over to the New York Times and its co-plaintiffs 20 million ChatGPT user conversations [...] OpenAI is unaware of …

Simon Willison's Blog
security
What happens if AI labs train for pelicans riding bicycles?

What happens if AI labs train for pelicans riding bicycles?

Almost every time I share a new example of an SVG of a pelican riding a bicycle a variant of this question pops up: how do you know the labs …

Simon Willison's Blog
platform
How I used Mastra to build a prize-winning RAG agent

How I used Mastra to build a prize-winning RAG agent

A developer's retrospective on creating an AI video transcription agent with Mastra, an open-source TypeScript framework for building AI agents.

logrocket-dev
framework tool
No Image

Quoting Steve Krouse

The fact that MCP is a difference surface from your normal API allows you to ship MUCH faster to MCP. This has been unlocked by inference at runtime Normal APIs …

Simon Willison's Blog
api
10 Best AI Tools for Product Managers in 2026

10 Best AI Tools for Product Managers in 2026

Top 10 AI tools I actually use as a PM: from user calls to PRDs to prototypes. Real workflows, measurable time savings, and honest takes on what works.

Builder.io Blog
api cloud tool
Agentic Pelican on a Bicycle

Agentic Pelican on a Bicycle

Robert Glaser took my pelican riding a bicycle benchmark and applied an agentic loop to it, seeing if vision models could draw a better pelican if they got the chance …

Simon Willison's Blog
platform
Six coding agents at once

Six coding agents at once

I've been upgrading a ton of Datasette plugins recently for compatibility with the Datasette 1.0a20 release from last week - 35 so far. A lot of the work is very …

Simon Willison's Blog
tool
The next phase of dev: Building for MCP and the open web

The next phase of dev: Building for MCP and the open web

MCP is the ultimate bridge that redefines how AI connects to the open web. Here's how it lets agents act across APIs and automate workflows.

logrocket-dev
api tool
No Image

Quoting Netflix

Netflix asks partners to consider the following guiding principles before leveraging GenAI in any creative workflow: The outputs do not replicate or substantially recreate identifiable characteristics of unowned or copyrighted …

Simon Willison's Blog
platform
15 Best AI Tools for Designers in 2026

15 Best AI Tools for Designers in 2026

Discover the best AI tools designers are using in 2026 to speed up workflows, generate designs, and connect directly with real design systems.

Builder.io Blog
tool ui
You’ve authenticated your user, but have you authorized your agent?

You’ve authenticated your user, but have you authorized your agent?

Secure AI agents beyond login screens with Auth0’s Auth for GenAI; from token management and human approval to fine-grained authorization.

logrocket-dev
api security tool
FTC’s AI chatbot crackdown: A developer compliance guide

FTC’s AI chatbot crackdown: A developer compliance guide

Learn how to build a fully compliant AI chatbot with FTC-mandated safeguards – age verification, safety monitoring, consent systems, and audit logging.

logrocket-dev
api security tool
Pelican on a Bike - Raytracer Edition

Pelican on a Bike - Raytracer Edition

beetle_b ran this prompt against a bunch of recent LLMs: Write a POV-Ray file that shows a pelican riding on a bicycle. This turns out to be a harder challenge …

Simon Willison's Blog
platform
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (November 3 - 9)

Elvis Saravia's NLP Blog
platform
Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It’s currently only available via their Codex CLI tool …

Simon Willison's Blog
api tool
MCP ツールのコンテキスト圧迫の問題とその解決策

MCP ツールのコンテキスト圧迫の問題とその解決策

MCP の普及に伴い、多数のツール定義が LLM のコンテキストを圧迫する課題が浮上しています。本記事では Progressive disclosure(段階的開示)による最小限の情報提供、MCP を使ったコード実行によるツール呼び出しの効率化、単一の検索ツールによるコンテキスト削減など、実践的な解決策を Claude Skills や Cloudflare Code Mode の事例とともに解説します。

azukiazusa のテックブログ2
api library tool
No Image

Quoting Kenton Varda

The big advantage of MCP over OpenAPI is that it is very clear about auth. [...] Maybe an agent could read the docs and write code to auth. But we …

Simon Willison's Blog
api security
🤖 AI Agents Weekly: Context Engineering 2.0, Kimi K2 Thinking, Windsurf Codemaps, Google File Search, Tool-to-Agent Retrieval

🤖 AI Agents Weekly: Context Engineering 2.0, Kimi K2 Thinking, Windsurf Codemaps, Google File Search, Tool-to-Agent Retrieval

Context Engineering 2.0, Kimi K2 Thinking, Windsurf Codemaps, Google File Search, Tool-to-Agent Retrieval

Elvis Saravia's NLP Blog
platform
No Image

Quoting Josh Cohenzadeh

I have AiDHD It has never been easier to build an MVP and in turn, it has never been harder to keep focus. When new features always feel like they're …

Simon Willison's Blog
api tool
No Image

Could LLMs encourage new programming languages?

My hunch is that existing LLMs make it easier to build a new programming language in a way that captures new developers. Most programming languages are similar enough to existing …

Simon Willison's Blog
library tool
Autogen vs. Crew AI: Choosing the right agentic framework

Autogen vs. Crew AI: Choosing the right agentic framework

Build autonomous AI agents with Autogen and Crew AI. Learn how agentic AI enables multi-agent systems, tools, and workflows in action.

logrocket-dev
framework tool
No Image

Using Codex CLI with gpt-oss:120b on an NVIDIA DGX Spark via Tailscale

Inspired by a YouTube comment I wrote up how I run OpenAI's Codex CLI coding agent against the gpt-oss:120b model running in Ollama on my NVIDIA DGX Spark via a …

Simon Willison's Blog
tool
No Image

You should write an agent

Thomas Ptacek on the Fly blog: Agents are the most surprising programming experience I’ve had in my career. Not because I’m awed by the magnitude of their powers — I …

Simon Willison's Blog
platform
No Image

Quoting Ben Stolovitz

My trepidation extends to complex literature searches. I use LLMs as secondary librarians when I’m doing research. They reliably find primary sources (articles, papers, etc.) that I miss in my …

Simon Willison's Blog
platform
SGLang Diffusion: Accelerating Video and Image Generation

SGLang Diffusion: Accelerating Video and Image Generation

<p>We are excited to introduce SGLang Diffusion, which brings SGLang's state-of-the-art performance to accelerate image and video generation for diffusion mo...

LMSYS Blog
library tool
Kimi K2 Thinking

Kimi K2 Thinking

Chinese AI lab Moonshot's Kimi K2 established itself as one of the largest open weight models - 1 trillion parameters - back in July. They've now released the Thinking version, …

Simon Willison's Blog
platform
No Image

Quoting Nathan Lambert

At the start of the year, most people loosely following AI probably knew of 0 [Chinese] AI labs. Now, and towards wrapping up 2025, I’d say all of DeepSeek, Qwen, …

Simon Willison's Blog
platform
AI dev tool power rankings & comparison [Nov 2025]

AI dev tool power rankings & comparison [Nov 2025]

Compare the top AI development tools and models of November 2025. View updated rankings, feature breakdowns, and find the best fit for you.

logrocket-dev
tool
Code research projects with async coding agents like Claude Code and Codex

Code research projects with async coding agents like Claude Code and Codex

I’ve been experimenting with a pattern for LLM usage recently that’s working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and …

Simon Willison's Blog
api tool
Fusion 1.0 - The First AI Agent for Product, Design, and Code

Fusion 1.0 - The First AI Agent for Product, Design, and Code

Fusion 1.0 is the first AI agent for product, design and code that builds, learns and ships features across your stack from idea to production.

Builder.io Blog
api cloud tool
No Image

Quoting @belligerentbarbies

I'm worried that they put co-pilot in Excel because Excel is the beast that drives our entire economy and do you know who has tamed that beast? Brenda. Who is …

Simon Willison's Blog
api tool
No Image

Code execution with MCP: Building more efficient agents

When I wrote about Claude Skills I mentioned that I don't use MCP at all any more when working with coding agents - I find CLI utilities and libraries like …

Simon Willison's Blog
api tool
No Image

MCP Colors: Systematically deal with prompt injection risk

Tim Kellogg proposes a neat way to think about prompt injection, especially with respect to MCP tools. Classify every tool with a color: red if it exposes the agent to …

Simon Willison's Blog
security
I tried OpenAI’s AgentKit: Does it make Zapier and n8n obsolete?

I tried OpenAI’s AgentKit: Does it make Zapier and n8n obsolete?

Examine AgentKit, Open AI's new tool for building agents. Conduct a side-by-side comparison with n8n by building AI agents with each tool.

logrocket-dev
api tool
A Jarvis for everyone: AI agents as new interfaces

A Jarvis for everyone: AI agents as new interfaces

Discover how AI agents and the Model Context Protocol (MCP) are redefining user interfaces, transforming apps into intelligent, conversational systems.

logrocket-dev
tool ui
No Image

Quoting Steve Francia

Every time an engineer evaluates a language that isn’t “theirs,” their brain is literally working against them. They’re not just analyzing technical trade offs, they’re contemplating a version of themselves …

Simon Willison's Blog
tool
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'

'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'

<p>We are excited to announce day-one support for the new flagship model, MiniMax M2, on SGLang. The MiniMax M2 redefines efficiency for agents: it is a comp...

LMSYS Blog
api tool
No Image

Quoting MiniMax

Interleaved thinking is essential for LLM agents: it means alternating between explicit reasoning and tool use, while carrying that reasoning forward between steps.This process significantly enhances planning, self‑correction, and reliability …

Simon Willison's Blog
api tool
Optimizing GPT-OSS on NVIDIA DGX Spark: Getting the Most Out of Your Spark

Optimizing GPT-OSS on NVIDIA DGX Spark: Getting the Most Out of Your Spark

<p>We’ve got some exciting updates about the <strong>NVIDIA DGX Spark</strong>! In the week following the official launch, we collaborated closely with NVIDI...

LMSYS Blog
api tool
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'

'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'

<p>We are excited to announce day-one support for the new flagship model, MiniMax M2, on SGLang. The MiniMax M2 redefines efficiency for agents: it is a comp...

LMSYS Blog
tool
New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend. Agents Rule of Two: A Practical Approach to AI Agent Security The first is …

Simon Willison's Blog
security
No Image

PyCon US 2026 call for proposals is now open

PyCon US is coming to the US west coast! 2026 and 2027 will both be held in Long Beach, California - the 2026 conference is set for May 13th-19th next …

Simon Willison's Blog
api tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (October 27 - November 2)

Elvis Saravia's NLP Blog
platform
No Image

How I Use Every Claude Code Feature

Useful, detailed guide from Shrivu Shankar, a Claude Code power user. Lots of tips for both individual Claude Code usage and configuring it for larger team projects. I appreciated Shrivu's …

Simon Willison's Blog
tool
No Image

Claude Code Can Debug Low-level Cryptography

Go cryptography author Filippo Valsorda reports on some very positive results applying Claude Code to the challenge of implementing novel cryptography algorithms. After Claude was able to resolve a "fairly …

Simon Willison's Blog
security tool
No Image

October 2025 sponsors-only newsletter

I just hit send on the October edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy …

Simon Willison's Blog
api tool
🤖 AI Agents Weekly: MiniMax-M2, Cursor 2.0, SWE-1.5, Agent Data Protocol, Kimi CLI

🤖 AI Agents Weekly: MiniMax-M2, Cursor 2.0, SWE-1.5, Agent Data Protocol, Kimi CLI

MiniMax-M2, Cursor 2.0, SWE-1.5, Agent Data Protocol, Kimi CLI

Elvis Saravia's NLP Blog
library tool
No Image

Curiosity-driven blogging

My piece this morning about the Marimo acquisition is an example of a variant of a TIL - I didn't know much about CoreWeave, the acquiring company, so I poked …

Simon Willison's Blog
tool
No Image

Marimo is Joining CoreWeave

I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …

Simon Willison's Blog
library tool
No Image

CoreWeave adds Marimo to their 2025 acquisition spree

I don't usually cover startup acquisitions here, but this one feels relevant to several of my interests. Marimo (previously) provide an open source (Apache 2 licensed) notebook tool for Python, …

Simon Willison's Blog
library tool
No Image

Quoting François Chollet

To really understand a concept, you have to "invent" it yourself in some capacity. Understanding doesn't come from passive content consumption. It is always self-built. It is an active, high-agency, …

Simon Willison's Blog
tool
Introducing SWE-1.5: Our Fast Agent Model

Introducing SWE-1.5: Our Fast Agent Model

Here's the second fast coding model released by a coding agent IDE in the same day - the first was Composer-1 by Cursor. This time it's Windsurf releasing SWE-1.5: Today …

Simon Willison's Blog
api cloud tool
MiniMax M2 & Agent: Ingenious in Simplicity

MiniMax M2 & Agent: Ingenious in Simplicity

MiniMax M2 was released on Monday 27th October by MiniMax, a Chinese AI lab founded in December 2021. It's a very promising model. Their self-reported benchmark scores show it as …

Simon Willison's Blog
tool
Composer: Building a fast frontier model with RL

Composer: Building a fast frontier model with RL

Cursor released Cursor 2.0 today, with a refreshed UI focused on agentic coding (and running agents in parallel) and a new model that's unique to Cursor called Composer&nbsp;1. As far …

Simon Willison's Blog
library tool
The Replay (10/29/25): Tiny AI agents, Next.js 16, and more

The Replay (10/29/25): Tiny AI agents, Next.js 16, and more

Discover what's new in The Replay, LogRocket's newsletter for dev and engineering leaders, in the October 29th issue.

logrocket-dev
framework library ui
Is Llama really as bad as people say? I put Meta’s AI to the test

Is Llama really as bad as people say? I put Meta’s AI to the test

Test out Meta's AI model, Llama, on a real CRUD frontend projects, compare it with competing models, and walk through the setup process.

logrocket-dev
framework tool
Small language models: Why the future of AI agents might be tiny

Small language models: Why the future of AI agents might be tiny

Rosario De Chiara discusses why small language models (SLMs) may outperform giants in specific real-world AI systems.

logrocket-dev
platform tool
Serena MCPツールを使用したカスタムPlanサブエージェント

Serena MCPツールを使用したカスタムPlanサブエージェント

はじめに Claude Code v2.0.28のアップデートによりPlan機能がサブエージェント化されました。Plan生成時のコンテキストが切り出され、メインコンテキストの削減に繋がるのが主な利点ですが、この影響でPlanモード実行時に利用されるツールがビルトインツールのみに制限され、MCPサーバーから提供されるツール(Serenaツールを含む)が使用できなくなりました。 Serenaユーザーから「ツールが使われなくなった」という報告を受けた著者はこの問題を特定し、Planエージェントを上書きすることでSerenaツールをサポートする方法を試みました。その結果、うまく動作したので知見を共有します。 💡注意: これは公式にサポートされている方法ではないため、将来のアップデートで動作しなくなる可能性があります。Claude Codeの柔軟なプラグイン機構のおかげで、さまざまなカスタマイズが可能で面白いですね。 この記事では、Serena MCPサーバーのツールを使用したカスタムPlanサブエージェントの使い方を説明します。 概要 このカスタムPlanサブエージェントは、C

Lai.so Blog
api tool
SGLang-Jax: An Open-Source Solution for Native TPU Inference

SGLang-Jax: An Open-Source Solution for Native TPU Inference

<p>We're excited to introduce SGLang-Jax, a state-of-the-art open-source inference engine built entirely on Jax and XLA. It leverages SGLang's high-performan...

LMSYS Blog
library tool
You’re doing vibe coding wrong: Here’s how to do it right

You’re doing vibe coding wrong: Here’s how to do it right

Vibe coding isn’t just AI-assisted chaos. Here’s how to avoid insecure, unreadable code and turn your “vibes” into real developer productivity.

logrocket-dev
api tool
Exploring spec-driven development with the new GitHub Spec Kit

Exploring spec-driven development with the new GitHub Spec Kit

Bring order to AI-assisted coding with GitHub SpecKit — a toolkit for structured, spec-driven development using Copilot, Claude, or Cursor.

logrocket-dev
api tool
No Image

Quoting Aaron Boodman

Claude doesn't make me much faster on the work that I am an expert on. Maybe 15-20% depending on the day. It's the work that I don't know how to …

Simon Willison's Blog
platform
No Image

GenAI Image Editing Showdown

Useful collection of examples by Shaun Pedicini who tested Seedream 4, Gemini 2.5 Flash, Qwen-Image-Edit, FLUX.1 Kontext [dev], FLUX.1 Kontext [max], OmniGen2, and OpenAI gpt-image-1 across 12 image editing prompts. …

Simon Willison's Blog
tool
No Image

Sora might have a 'pervert' problem on its hands

Katie Notopoulos turned on the Sora 2 option where anyone can make a video featuring her cameo, and then: I found a stranger had made a video where I appeared …

Simon Willison's Blog
tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (October 20-26)

Elvis Saravia's NLP Blog
platform
Claude Skills でエージェントに専門的なタスクを実行させる

Claude Skills でエージェントに専門的なタスクを実行させる

Claude Skills は Claude が特定のタスクを実行するためのカスタムスキルを作成・共有できる新しい機能です。この記事では、Claude Skills の仕組みと作成方法、MCP ツールとの違いについて解説します。

azukiazusa のテックブログ2
api tool
No Image

Setting up a codebase for working with coding agents

Someone on Hacker News asked for tips on setting up a codebase to be more productive with AI coding tools. Here's my reply: Good automated tests which the coding agent …

Simon Willison's Blog
api library tool
🤖 AI Agents Weekly: DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser,...

🤖 AI Agents Weekly: DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser,...

DeepSeek-OCR, Claude Code on the Web, ChatGPT Atlas Browser

Elvis Saravia's NLP Blog
api tool
No Image

Quoting Claude Docs

If you have an AGENTS.md file, you can source it in your CLAUDE.md using @AGENTS.md to maintain a single source of truth.

Simon Willison's Blog
tool
Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding

Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding

New model interpretability research from Anthropic, this time focused on SVG and ASCII art generation. We found that the same feature that activates over the eyes in an ASCII face …

Simon Willison's Blog
tool
claude_code_docs_map.md

claude_code_docs_map.md

Something I'm enjoying about Claude Code is that any time you ask it questions about itself it runs tool calls like these: In this case I'd asked it about its …

Simon Willison's Blog
api tool
No Image

Quoting Geoffrey Litt

A lot of people say AI will make us all "managers" or "editors"...but I think this is a dangerously incomplete view! Personally, I'm trying to code like a surgeon. A …

Simon Willison's Blog
api tool
No Image

OpenAI no longer has to preserve all of its ChatGPT data, with some exceptions

This is a relief: Federal judge Ona T. Wang filed a new order on October 9 that frees OpenAI of an obligation to "preserve and segregate all output log data …

Simon Willison's Blog
platform
No Image

Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas

My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post …

Simon Willison's Blog
api security tool
The Replay (10/22/25): AI-assisted coding, Wasm 3.0, and more

The Replay (10/22/25): AI-assisted coding, Wasm 3.0, and more

Discover what's new in The Replay, LogRocket's newsletter for dev and engineering leaders, in the October 22nd issue.

logrocket-dev
framework tool
Where AI-assisted coding accelerates development — and where it doesn’t

Where AI-assisted coding accelerates development — and where it doesn’t

John Reilly discusses how software development has been changed by the innovations of AI: both the positives and the negatives.

logrocket-dev
library tool
Living dangerously with Claude

Living dangerously with Claude

I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling …

Simon Willison's Blog
api tool
SLOCCount in WebAssembly

SLOCCount in WebAssembly

This project/side-quest got a little bit out of hand. I remembered an old tool called SLOCCount which could count lines of code and produce an estimate for how much they …

Simon Willison's Blog
tool
No Image

Don't let Claude Code delete your session logs

Claude Code stores full logs of your sessions as newline-delimited JSON in ~/.claude/projects/encoded-directory/*.jsonl on your machine. I currently have 379MB of these! Here's an example jsonl file which I extracted …

Simon Willison's Blog
tool
Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers

Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers

The Brave security team wrote about prompt injection against browser agents a few months ago (here are my notes on that). Here's their follow-up: What we’ve found confirms our initial …

Simon Willison's Blog
security
Introducing ChatGPT Atlas

Introducing ChatGPT Atlas

Last year OpenAI hired Chrome engineer Darin Fisher, which sparked speculation they might have their own browser in the pipeline. Today it arrived. ChatGPT Atlas is a Mac-only web browser …

Simon Willison's Blog
tool ui
TypeScript版DSPy、axを試してみた

TypeScript版DSPy、axを試してみた

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
library tool
No Image

Quoting Bruce Schneier and Barath Raghavan

Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include …

Simon Willison's Blog
security
Claude Code for web - a new asynchronous coding agent from Anthropic

Claude Code for web - a new asynchronous coding agent from Anthropic

Anthropic launched Claude Code for web this morning. It’s an asynchronous coding agent—their answer to OpenAI’s Codex Cloud and Google’s Jules, and has a very similar shape. I had preview …

Simon Willison's Blog
api tool
Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

DeepSeek released a new model yesterday: DeepSeek-OCR, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running …

Simon Willison's Blog
api tool
Announcing Experimental Malware Scanning for the Hugging Face Ecosystem

Announcing Experimental Malware Scanning for the Hugging Face Ecosystem

Socket is launching experimental protection for the Hugging Face ecosystem, scanning for malware and malicious payload injections inside model files t...

Socket
api tool
Oracle AI World 2025 参加レポート

Oracle AI World 2025 参加レポート

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
cloud tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (October 13-19)

Elvis Saravia's NLP Blog
platform
TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

I landed a PR by Manuel Solorzano adding pricing information to llm-prices.com for OpenAI's o4-mini-deep-research and o3-deep-research models, which they released in June and document here. I realized I'd never …

Simon Willison's Blog
api tool
🤖 AI Agents Weekly: Claude Haiku 4.5, Deep Agents, SWE-grep, nanochat, Agent Skills, Veo 3.1 Fast, n8n AI Workflow Builder

🤖 AI Agents Weekly: Claude Haiku 4.5, Deep Agents, SWE-grep, nanochat, Agent Skills, Veo 3.1 Fast, n8n AI Workflow Builder

Claude Haiku 4.5, Deep Agents, SWE-grep, nanochat, Agent Skills, Veo 3.1 Fast, n8n AI Workflow Builder

Elvis Saravia's NLP Blog
framework tool
No Image

The AI water issue is fake

Andy Masley (previously): All U.S. data centers (which mostly support the internet, not AI) used 200--250 million gallons of freshwater daily in 2023. The U.S. consumes approximately 132 billion gallons …

Simon Willison's Blog
api cloud infra
No Image

Andrej Karpathy — AGI is still a decade away

Extremely high signal 2 hour 25 minute (!) conversation between Andrej Karpathy and Dwarkesh Patel. It starts with Andrej's claim that "the year of agents" is actually more likely to …

Simon Willison's Blog
library tool
Claude Skillsとは何なのか?

Claude Skillsとは何なのか?

AnthropicがClaudeの新機能 Claude Skills (Agent Skills)を追加したと発表しました。Claude Skillsは、Markdownファイルとスクリプトで構成される「スキルフォルダ」を通じて、モデルに特定の機能や知識を拡張できる仕組みです。 Claude Skills: Customize AI for your workflowsBuild custom Skills to teach Claude specialized tasks. Create once, use everywhere—from spreadsheets to coding. Available across Claude.ai, API, and Code.Box logo もともとClaudeは8月にチャットアシスタントからのコード実行環境をアップデートしていました。それまでは指示に応じてPythonコードを実行しグラフ生成やデータ分析をする用途でしたが、Bashコマンドをサンドボックスで自由に実行できる環境ができていました。 Claude can now cre

Lai.so Blog
api tool
No Image

Quoting Alexander Fridriksson and Jay Miller

Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit …

Simon Willison's Blog
api database security
No Image

Quoting Barry Zhang

Skills actually came out of a prototype I built demonstrating that Claude Code is a general-purpose agent :-) It was a natural conclusion once we realized that bash + filesystem …

Simon Willison's Blog
platform
Claude Skills are awesome, maybe a bigger deal than MCP

Claude Skills are awesome, maybe a bigger deal than MCP

Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills …

Simon Willison's Blog
api tool
ENISA’s 2025 Threat Landscape: AI Reshapes Cyber Attacks, from Phishing to Supply Chain Abuse

ENISA’s 2025 Threat Landscape: AI Reshapes Cyber Attacks, from Phishing to Supply Chain Abuse

ENISA’s 2025 Threat Landscape report highlights how AI is reshaping cyber attacks, driving phishing, model poisoning, and software supply chain risks.

Socket
api cloud security
Deep Agents

Deep Agents

On the future of AI Agents.

Elvis Saravia's NLP Blog
api tool
No Image

NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0

EXO Labs wired a 256GB M3 Ultra Mac Studio up to an NVIDIA DGX Spark and got a 2.8x performance boost serving Llama-3.1 8B (FP16) with an 8,192 token prompt. …

Simon Willison's Blog
framework tool
No Image

Quoting Riana Pfefferkorn

Pro se litigants account for the majority of the cases in the United States where a party submitted a court filing containing AI hallucinations. In a country where legal representation …

Simon Willison's Blog
platform
No Image

Coding without typing the code

Last year the most useful exercise for getting a feel for how good LLMs were at writing code was vibe coding (before that name had even been coined) - seeing …

Simon Willison's Blog
platform
No Image

Quoting Catherine Wu

While Sonnet 4.5 remains the default [in Claude Code], Haiku 4.5 now powers the Explore subagent which can rapidly gather context on your codebase to build apps even faster. You …

Simon Willison's Blog
platform
Introducing Claude Haiku 4.5

Introducing Claude Haiku 4.5

Anthropic released Claude Haiku 4.5 today, the cheapest member of the Claude 4.5 family that started with Sonnet 4.5 a couple of weeks ago. It's priced at $1/million input tokens …

Simon Willison's Blog
platform
No Image

Quoting Claude Haiku 4.5 System Card

Previous system cards have reported results on an expanded version of our earlier agentic misalignment evaluation suite: three families of exotic scenarios meant to elicit the model to commit blackmail, …

Simon Willison's Blog
platform
Want to run your AI model locally? Here’s what you should know

Want to run your AI model locally? Here’s what you should know

As costs and privacy concerns grow, enterprises are shifting from cloud to local AI. Learn what it takes to run models locally, and why it matters.

logrocket-dev
framework tool
LLM-as-a-Judgeにまつわるバイアスまとめ

LLM-as-a-Judgeにまつわるバイアスまとめ

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
platform
NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA sent me a preview unit of their new DGX Spark desktop “AI supercomputer”. I’ve never had hardware to review before! You can consider this my first ever sponsored post …

Simon Willison's Blog
cloud tool
No Image

Just Talk To It - the no-bs Way of Agentic Engineering

Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions …

Simon Willison's Blog
api tool
AIに技術記事を書かせる:9回の反復で到達した「完璧すぎる」という逆説

AIに技術記事を書かせる:9回の反復で到達した「完璧すぎる」という逆説

この記事では、AIに技術記事を書かせる試みについて述べられています。著者は、Claude Codeを使用して、記事生成、レビュー、スタイルガイドの改善を繰り返すシステムを構築しました。最初は品質が7〜8割程度と予想していましたが、9回の反復を経て9.0/10の評価に達しました。特に、完璧すぎる記事が逆にAIらしさを感じさせるという「完璧すぎる逆説」に直面しました。システムは3つのエージェント(Writer Agent、Reviewer Agent、Style Guide Updater)で構成され、各エージェントは独立して機能します。反復を重ねる中で、メタ認知的シフトや不完全さの重要性が明らかになり、最終的には人間らしい不完全さを取り入れることで、より自然な記事が生成されるようになりました。 • AIに技術記事を書かせる試みの目的は、人間と区別できないレベルの品質を目指すこと。 • Claude Codeを使用し、記事生成、レビュー、スタイルガイド改善のサイクルを構築。 • 反復を重ねる中で、品質が向上し、最終的に9.0/10の評価を得る。 • 完璧すぎる記事が逆にAIらしさを感じさせるという課題に直面。 • システムは3つのエージェント(Writer、Reviewer、Style Guide Updater)で構成され、各エージェントは独立して機能。 • メタ認知的シフトや不完全さの重要性が明らかになり、自然な記事生成に寄与。 • 不完全さを取り入れることで、より人間らしい記事が生成されるようになった。

zenn-uhyo
api library tool
NVIDIA and SGLang Accelerating SemiAnalysis InferenceMAX and GB200 Together

NVIDIA and SGLang Accelerating SemiAnalysis InferenceMAX and GB200 Together

<p>The SGLang and NVIDIA teams have a strong track record of collaboration, consistently delivering inference optimizations and system-level improvements to ...

LMSYS Blog
api framework tool
No Image

nanochat

Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …

Simon Willison's Blog
tool
AI dev tool power rankings & comparison [Oct 2025]

AI dev tool power rankings & comparison [Oct 2025]

Compare the top AI development tools and models of September 2025. View updated rankings, feature breakdowns, and find the best fit for you.

logrocket-dev
api cloud tool
NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

<p>Thanks to NVIDIA’s early access program, we are thrilled to get our hands on the NVIDIA DGX™ Spark. It’s quite an unconventional system, as NVIDIA rarely ...

LMSYS Blog
framework tool
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (October 6-12)

Elvis Saravia's NLP Blog
platform
No Image

Claude Code sub-agents

Claude Code includes the ability to run sub-agents, where a separate agent loop with a fresh token context is dispatched to achieve a goal and report back when it's done. …

Simon Willison's Blog
api tool
No Image

Vibing a Non-Trivial Ghostty Feature

Mitchell Hashimoto provides a comprehensive answer to the frequent demand for a detailed description of shipping a non-trivial production feature to an existing project using AI-assistance. In this case it's …

Simon Willison's Blog
api library tool
🤖 AI Agents Weekly: AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender

🤖 AI Agents Weekly: AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender

AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender

Elvis Saravia's NLP Blog
api framework tool
No Image

Note on 11th October 2025

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code …

Simon Willison's Blog
platform
ChatGPT 内でアプリを直接操作する Apps SDK に自作のアプリを接続する

ChatGPT 内でアプリを直接操作する Apps SDK に自作のアプリを接続する

Apps in ChatGPT は ChatGPT のチャット内で会話の流れに応じて外部のアプリを呼び出し、インタラクティブな操作を可能にする機能です。アプリごとに独自の UI コンポーネントを提供し、ユーザーはチャット画面からシームレスな体験でアプリを操作できます。この記事では Apps SDK を使用して、実際に ChatGPT 内で動作するシンプルなアプリを作成する手順を紹介します。

azukiazusa のテックブログ2
api tool
No Image

simonw/claude-skills

One of the tips I picked up from Jesse Vincent's Claude Code Superpowers post (previously) was this: Skills are what give your agents Superpowers. The first time they really popped …

Simon Willison's Blog
api tool
Superpowers: How I'm using coding agents in October 2025

Superpowers: How I'm using coding agents in October 2025

A follow-up to Jesse Vincent's post about September, but this is a really significant piece in its own right. Jesse is one of the most creative users of coding agents …

Simon Willison's Blog
api tool
No Image

A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises

Filippo Valsorda surveyed 18 incidents from the past year of open source supply chain attacks, where package updates were infected with malware thanks to a compromise of the project itself. …

Simon Willison's Blog
security
No Image

Video of GPT-OSS 20B running on a phone

GPT-OSS 20B is a very good model. At launch OpenAI claimed: The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with …

Simon Willison's Blog
tool
AIエージェントにおけるコンテキスト圧縮手法の評価 (AI Shiftインターン体験記)

AIエージェントにおけるコンテキスト圧縮手法の評価 (AI Shiftインターン体験記)

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
platform
No Image

Quoting Gergely Orosz

I get a feeling that working with multiple AI agents is something that comes VERY natural to most senior+ engineers or tech lead who worked at a large company You …

Simon Willison's Blog
platform
LangChain.js is overrated; Build your AI agent with a simple fetch call

LangChain.js is overrated; Build your AI agent with a simple fetch call

Skip the LangChain.js overhead: How to build a Retrieval-Augmented Generation (RAG) AI agent from scratch using just the native `fetch()` API.

logrocket-dev
api tool
Deepgram Fluxを使ったターンテイキング認識の実験

Deepgram Fluxを使ったターンテイキング認識の実験

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
api tool
No Image

Claude can write complete Datasette plugins now

This isn’t necessarily surprising, but it’s worth noting anyway. Claude Sonnet 4.5 is capable of building a full Datasette plugin now. I’ve seen models complete aspects of this in the …

Simon Willison's Blog
api tool
No Image

Quoting Simon Højberg

The cognitive debt of LLM-laden coding extends beyond disengagement of our craft. We’ve all heard the stories. Hyped up, vibed up, slop-jockeys with attention spans shorter than the framework-hopping JavaScript …

Simon Willison's Blog
platform
Goodbye, messy data: An engineer’s guide to scalable data enrichment

Goodbye, messy data: An engineer’s guide to scalable data enrichment

Walk through building a data enrichment workflow that moves beyond simple lead gen to become a powerful internal tool for enterprises.

logrocket-dev
api cloud tool
Gemini 2.5 Computer Use can solve Google's own CAPTCHAs

Gemini 2.5 Computer Use can solve Google's own CAPTCHAs

Google just introduced a new Gemini 2.5 Computer Use model, specially designed to help operate a GUI interface by interacting with visible elements using a virtual mouse and keyboard. I …

Simon Willison's Blog
framework tool
No Image

Vibe engineering

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to …

Simon Willison's Blog
library tool
DesignCoder and the future of AI-generated UI

DesignCoder and the future of AI-generated UI

Explore DesignCoder, a hierarchy-aware and self-correcting approach to AI-generated UI, and what it means for frontend devs and enterprises.

logrocket-dev
tool ui
No Image

Deloitte to pay money back to Albanese government after using AI in $440,000 report

Ouch: Deloitte will provide a partial refund to the federal government over a $440,000 report that contained several errors, after admitting it used generative artificial intelligence to help produce it. …

Simon Willison's Blog
platform
No Image

a system that can do work independently on behalf of the user

I've settled on agents as meaning "LLMs calling tools in a loop to achieve a goal" but OpenAI continue to muddy the waters with much more vague definitions. Swyx spotted …

Simon Willison's Blog
platform
gpt-image-1-mini

gpt-image-1-mini

OpenAI released a new image model today: gpt-image-1-mini, which they describe as "A smaller image generation model that’s 80% less expensive than the large model." They released it very quietly …

Simon Willison's Blog
api tool
GPT-5 pro

GPT-5 pro

Here's OpenAI's model documentation for their GPT-5 pro model, released to their API today at their DevDay event. It has similar base characteristics to GPT-5: both share a September 30, …

Simon Willison's Blog
api
OpenAI DevDay 2025 発表まとめ

OpenAI DevDay 2025 発表まとめ

OpenAI DevDay 2025がサンフランシスコで開催され、様々な新機能が発表された。主な内容には、ChatGPT内で使用できるアプリ機能を提供するApps SDKのプレビュー版が含まれ、開発者は8億人以上のChatGPTユーザーにリーチできる。初期パートナーにはBooking.comやCanvaなどが名を連ね、年末にはアプリ機能の審査が開始される予定。また、Codexが正式リリースされ、Slackとの統合機能や管理ツールが追加された。さらに、GPT-5のAPIリクエストが40%高速化され、Sora 2のAPI対応や新しい画像生成モデルも発表された。OpenAIのクックブックには、プロンプトのレジリエンスを担保するための評価フライホイールのガイドが追加された。 • OpenAI DevDay 2025で新機能が発表された。 • Apps SDKにより、ChatGPT内でアプリ機能が利用可能になる。 • 初期パートナー企業としてBooking.comやCanvaが参加。 • Codexが正式リリースされ、Slackとの統合機能が追加された。 • GPT-5のAPIリクエストが40%高速化される。 • Sora 2のAPI対応や新しい画像生成モデルが発表された。 • OpenAIのクックブックにプロンプトのレジリエンスを担保するガイドが追加された。

Zenn schroneko
api cloud tool
No Image

OpenAI DevDay 2025 live blog

I’m at OpenAI DevDay in Fort Mason, San Francisco today. As I did last year, I’m going to be live blogging the announcements from the kenote. Unlike last year, this …

Simon Willison's Blog
platform
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (September 29 - October 5)

Elvis Saravia's NLP Blog
tool
No Image

Embracing the parallel coding agent lifestyle

For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in …

Simon Willison's Blog
tool
Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines

Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines

I've had trouble getting my head around DSPy in the past. This half hour talk by Drew Breunig at the recent Databricks Data + AI Summit is the clearest explanation …

Simon Willison's Blog
platform
🤖 AI Agents Weekly: Claude Agent SDK, Sora 2, Claude Sonnet 4.5, Microsoft Agent Framework, GLM-4.6, Agentic Commerce Protocol

🤖 AI Agents Weekly: Claude Agent SDK, Sora 2, Claude Sonnet 4.5, Microsoft Agent Framework, GLM-4.6, Agentic Commerce Protocol

Claude Agent SDK, Sora 2, Claude Sonnet 4.5, Microsoft Agent Framework, GLM-4.6, Agentic Commerce Protocol

Elvis Saravia's NLP Blog
api tool
MCP のツールアノテーションでユーザーにヒントを提供する

MCP のツールアノテーションでユーザーにヒントを提供する

MCP ではツールアノテーションを使用して、ユーザーにツールの動作に関するヒントを提供できます。例えば `readOnlyHint` を設定することで、ツールがデータを変更しないことを示すことができます。この記事では TypeScript SDK を使用して MCP サーバーでツールアノテーションを設定し、Claude Code クライアントでどのように表示されるかを確認します。

azukiazusa のテックブログ2
tool
DeepSeek-V3.2-Expがリリース:コスト効率を大幅に改善したアップデート

DeepSeek-V3.2-Expがリリース:コスト効率を大幅に改善したアップデート

DeepSeekは新バージョン DeepSeek-V3.2-Exp を発表しました。このモデルは、直前のV3.1-Terminusをベースに、DeepSeek Sparse Attention (DSA) と呼ばれるDeepSeek独自のSparse Attentionを導入してコスト効率を向上しています。 GitHub - deepseek-ai/DeepSeek-V3.2-ExpContribute to deepseek-ai/DeepSeek-V3.2-Exp development by creating an account on GitHub.GitHubdeepseek-ai 特徴 DeepSeek-V3.2-ExpのSparse Attentionは入力トークンの一部だけに注意を向ける仕組みで、入力長が増えるほど計算量削減の効果が大きくなります。 Transformerアーキテクチャは入力が長くなると必要な計算が二乗に比例して増える仕組みでしたが、DSAでは入力されたトークンを内部でインデックス化し、関連度を素早く見積もることで対象を絞り込み効率化します。

Lai.so Blog
api cloud tool
No Image

Sora 2 prompt injection

It turns out Sora 2 is vulnerable to prompt injection! When you onboard to Sora you get the option to create your own "cameo" - a virtual video recreation of …

Simon Willison's Blog
security
Daniel Stenberg's note on AI assisted curl bug reports

Daniel Stenberg's note on AI assisted curl bug reports

Curl maintainer Daniel Stenberg on Mastodon: Joshua Rogers sent us a massive list of potential issues in #curl that he found using his set of AI assisted tools. Code analyzer …

Simon Willison's Blog
api tool
No Image

Quoting Nadia Eghbal

When attention is being appropriated, producers need to weigh the costs and benefits of the transaction. To assess whether the appropriation of attention is net-positive, it’s useful to distinguish between …

Simon Willison's Blog
api tool
aavetis/PRarena

aavetis/PRarena

Albert Avetisian runs this repository on GitHub which uses the Github Search API to track the number of PRs that can be credited to a collection of different coding agents. …

Simon Willison's Blog
api cloud tool
Two more Chinese pelicans

Two more Chinese pelicans

Two new models from Chinese AI labs in the past few days. I tried them both out using llm-openrouter: DeepSeek-V3.2-Exp from DeepSeek. Announcement, Tech Report, Hugging Face (690GB, MIT license). …

Simon Willison's Blog
platform
Animals vs Ghosts

Animals vs Ghosts

Today's frontier LLM research is not about building animals. It is about summoning ghosts. And a bit more on Sutton's Dwarkesh pod.

Andrej Karpathy's Blog
platform
A spec-first workflow for building with agentic AI

A spec-first workflow for building with agentic AI

Andrew Evans gives his take on agentic AI and walks through a step-by-step method to build a spec-first workflow using Claude Code.

logrocket-dev
api cloud tool
No Image

September monthly sponsors newsletter

I just sent out the September edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy here. …

Simon Willison's Blog
api tool
Sora 2 発表関連情報まとめ

Sora 2 発表関連情報まとめ

OpenAIがSora 2を発表し、動画生成サービスを提供開始しました。Sora 2は、ChatGPT Proプランの契約が必要で、現在はアメリカとカナダでのみ利用可能です。新しいiOSアプリSoraでは、ユーザーが動画を生成し、他のユーザーのコンテンツをリミックスすることができます。特に「カメオ機能」により、自分や友人を動画に出演させることが可能です。Sora 2は、物理法則に基づいた自然な動きやフォトリアルな表現ができ、音声や効果音の生成も行えます。安全性を重視し、生成動画にはトラッキング可能なウォーターマークが付与され、ユーザーの健康状態を確認する機能やペアレンタルコントロール機能も搭載されています。今後はAPI経由での提供も予定されています。 • OpenAIがSora 2を発表し、動画生成サービスを開始した。 • Sora 2を利用するにはChatGPT Proプランの契約が必要で、現在はアメリカとカナダでのみ使用可能。 • iOSアプリSoraでは、動画生成や他のユーザーのコンテンツのリミックスが可能。 • 「カメオ機能」により、自分や友人を動画に出演させることができる。 • Sora 2は物理法則に基づいた自然な動きやフォトリアルな表現が可能で、音声や効果音の生成も行える。 • 安全性を重視し、生成動画にはトラッキング可能なウォーターマークが付与されている。 • ユーザーの健康状態を確認する機能やペアレンタルコントロール機能も搭載。 • 今後はAPI経由での提供も予定されている。

Zenn schroneko
api mobile tool
No Image

Sora 2

Having watched this morning's Sora 2 introduction video, the most notable feature (aside from audio generation - original Sora was silent, Google's Veo 3 supported audio in May 2025) looks …

Simon Willison's Blog
tool
No Image

Designing agentic loops

Coding agents like Anthropic’s Claude Code and OpenAI’s Codex CLI represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly …

Simon Willison's Blog
api tool
【今日の話題】Sonnet 4.5、Cursorブラウザツール、Instant Checkout

【今日の話題】Sonnet 4.5、Cursorブラウザツール、Instant Checkout

Claude Sonnet 4.5 がリリース Introducing Claude Sonnet 4.5Claude Sonnet 4.5 is the best coding model in the world, strongest model for building complex agents, and best model at using computers.logo * 「最強のコーディングモデル」として発表され、30時間以上の自律コーディングを達成したとの報告。 * SWE-bench Verified で 77.2%(並列実行/Best for N方式では82%、)の課題解決率を記録し、長時間安定して計画を維持できる。 * 一方で「GPT-5

Lai.so Blog
api tool
Claude Sonnet 4.5 発表関連情報まとめ

Claude Sonnet 4.5 発表関連情報まとめ

Claude Sonnet 4.5が発表され、あらゆるプラットフォームで利用可能になった。新モデルは、複雑なエージェントの構築やコンピュータ操作、リーズニング、数学タスクにおいて大幅な性能向上を実現し、30時間を超える複雑なタスクを遂行できる。チェックポイント機能が追加され、作業の進捗状況を保管・ロールバック可能になった。安全性の学習により、ユーザの指示に過度に従ったり虚偽の回答をするリスクが低減され、プロンプトインジェクション攻撃に対する防御性能も強化された。Claude Agent SDKは、コーディング以外の幅広いタスクに対応する汎用エージェントの構築を可能にし、エージェントループを用いた動作が特徴。 • Claude Sonnet 4.5は複雑なエージェントの構築やコンピュータ操作において性能向上を実現した。 • 新たにチェックポイント機能が追加され、作業の進捗状況を保管・ロールバックできる。 • 安全性の学習により、ユーザの指示に過度に従うリスクが低減された。 • プロンプトインジェクション攻撃に対する防御性能が強化された。 • Claude Agent SDKはコーディング以外のタスクにも対応する汎用エージェントの構築を可能にする。

Zenn schroneko
api cloud tool
Claude Sonnet 4.5 is probably the "best coding model in the world" (at least for now)

Claude Sonnet 4.5 is probably the "best coding model in the world" (at least for now)

Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims: Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building …

Simon Willison's Blog
api tool
No Image

Armin Ronacher: 90%

The idea of AI writing "90% of the code" to-date has mostly been expressed by people who sell AI tooling. Over the last few months, I've increasingly seen the same …

Simon Willison's Blog
api tool
No Image

Quoting Scott Aaronson

Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked …

Simon Willison's Blog
platform
SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention

SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention

<p>We are excited to announce that <strong>SGLang supports DeepSeek-V3.2 on Day 0</strong>! According to the DeepSeek <a href="https://github.com/deepseek-ai...

LMSYS Blog
library tool
No Image

Quoting Nick Turley

We’ve seen the strong reactions to 4o responses and want to explain what is happening. We’ve started testing a new safety routing system in ChatGPT. As we previously mentioned, when …

Simon Willison's Blog
platform
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (September 22-28)

Elvis Saravia's NLP Blog
platform
Codex vs Claude Code: which is the better AI coding agent?

Codex vs Claude Code: which is the better AI coding agent?

A practical look at Codex vs Claude Code: agents, model choices, costs, and the workflows they enable in real projects.

Builder.io Blog
tool
PD-Multiplexing: Unlocking High-Goodput LLM Serving with GreenContext

PD-Multiplexing: Unlocking High-Goodput LLM Serving with GreenContext

<p>This post highlights our initial efforts to support <strong>a new serving paradigm, PD-Multiplexing, in</strong> <strong>SGLang.</strong> It is designed t...

LMSYS Blog
library tool
Video models are zero-shot learners and reasoners

Video models are zero-shot learners and reasoners

Fascinating new paper from Google DeepMind which makes a very convincing case that their Veo 3 model - and generative video models in general - serve a similar role in …

Simon Willison's Blog
tool
🤖 AI Agents Weekly: Code World Model, Gemini Robotics-ER 1.5, Figma MCP server, Overhearing LLM Agents, Qwen3-Max, Gamma API

🤖 AI Agents Weekly: Code World Model, Gemini Robotics-ER 1.5, Figma MCP server, Overhearing LLM Agents, Qwen3-Max, Gamma API

Code World Model, Gemini Robotics-ER 1.5, Figma MCP server, Overhearing LLM Agents, Qwen3-Max, Gamma API

Elvis Saravia's NLP Blog
platform
GitHub Copilot CLIがリリース

GitHub Copilot CLIがリリース

2025年9月25日、GitHubが「GitHub Copilot CLI」をパブリックプレビューとして公開しました。 GitHub Copilot CLI is now in public preview - GitHub ChangelogGitHub Copilot CLI is now in public preview We’re bringing the power of GitHub Copilot coding agent directly to your terminal. With GitHub Copilot CLI, you can work locally and…The GitHub BlogAllison

Lai.so Blog
api tool
Chrome DevTools MCP で AI エージェントのフロントエンド開発をサポートする

Chrome DevTools MCP で AI エージェントのフロントエンド開発をサポートする

自律的な AI エージェントを利用したコーディングでは、生成したコードを実行した結果からフィードバックを得て、コードを改善していく反復的なプロセスが重要です。しかし、フロントエンド開発では、生成したコードはブラウザ上で実行されるため、AI エージェントが直接コードを実行したり、ブラウザのコンソールログを取得したりすることは困難です。Chrome DevTools MCP はこの課題を解決するためのツールです。

azukiazusa のテックブログ2
api tool
No Image

ForcedLeak: AI Agent risks exposed in Salesforce AgentForce

Classic lethal trifecta image exfiltration bug reported against Salesforce AgentForce by Sasi Levi and Noma Security. Here the malicious instructions come in via the Salesforce Web-to-Lead feature. When a Salesforce …

Simon Willison's Blog
api cloud security
No Image

How to stop AI’s “lethal trifecta”

This is the second mention of the lethal trifecta in the Economist in just the last week! Their earlier coverage was Why AI systems may never be secure on September …

Simon Willison's Blog
security
YANS2025 参加報告

YANS2025 参加報告

AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

AI-Shift Tech Blog
platform
Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G

<h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1...

LMSYS Blog
framework tool
No Image

GitHub Copilot CLI is now in public preview

GitHub now have their own entry in the coding terminal CLI agent space: Copilot CLI. It's the same basic shape as Claude Code, Codex CLI, Gemini CLI and a growing …

Simon Willison's Blog
api tool
Improved Gemini 2.5 Flash and Flash-Lite

Improved Gemini 2.5 Flash and Flash-Lite

Two new preview models from Google - updates to their fast and inexpensive Flash and Flash Lite families: The latest version of Gemini 2.5 Flash-Lite was trained and built based …

Simon Willison's Blog
api tool
No Image

Don't hide your best documentation

If you hide the system prompt and tool descriptions for your LLM agent, what you're actually doing is deliberately hiding the most useful documentation describing your service from your most …

Simon Willison's Blog
platform
Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput

<p>The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress to optimize the inference performance of ...

LMSYS Blog
library tool
No Image

Quoting Stanford CS221 Autumn 2025

[2 points] Learn basic NumPy operations with an AI tutor! Use an AI chatbot (e.g., ChatGPT, Claude, Gemini, or Stanford AI Playground) to teach yourself how to do basic vector …

Simon Willison's Blog
tool
No Image

Cross-Agent Privilege Escalation: When Agents Free Each Other

Here's a clever new form of AI exploit from Johann Rehberger, who has coined the term Cross-Agent Privilege Escalation to describe an attack where multiple coding agents - GitHub Copilot …

Simon Willison's Blog
security
6 easy ways to level up Claude Code

6 easy ways to level up Claude Code

Walk through six tips and tricks that help you level up Claude Code to move beyond simply entering prompts into a text box.

logrocket-dev
api tool
GPT-5-Codex

GPT-5-Codex

OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed …

Simon Willison's Blog
api library tool
No Image

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action

I've been looking forward to this. Qwen 2.5 VL is one of the best available open weight vision LLMs, so I had high hopes for Qwen 3's vision models. Firstly, …

Simon Willison's Blog
platform
YAML ファイルで AI エージェントを構築する cagent

YAML ファイルで AI エージェントを構築する cagent

cagent は Docker 社が開発した AI エージェントフレームワークです。YAML ファイルでエージェントの振る舞い・役割・使用するツールを宣言的に定義でき、コードを 1 行も書かずにエージェントを構築できます。この記事では cagent の概要とインストール方法、YAML ファイルの書き方、実際にエージェントを動作させるまでの手順を解説します。

azukiazusa のテックブログ2
api tool
No Image

Why AI systems might never be secure

The Economist have a new piece out about LLM security, with this headline and subtitle: Why AI systems might never be secure A “lethal trifecta” of conditions opens them to …

Simon Willison's Blog
security
No Image

Quoting Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano and Jeffrey T. Hancock

We define workslop as AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task. Here’s how this happens. As AI tools …

Simon Willison's Blog
tool
Four new releases from Qwen

Four new releases from Qwen

It's been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements): Qwen3-Next-80B-A3B-Instruct-FP8 and …

Simon Willison's Blog
library tool
CompileBench: Can AI Compile 22-year-old Code?

CompileBench: Can AI Compile 22-year-old Code?

Interesting new LLM benchmark from Piotr Grabowski and Piotr Migdał: how well can different models handle compilation challenges such as cross-compiling gucr for ARM64 architecture? This is one of my …

Simon Willison's Blog
api tool
No Image

ChatGPT Is Blowing Up Marriages as Spouses Use AI to Attack Their Partners

Maggie Harrison Dupré for Futurism. It turns out having an always-available "marriage therapist" with a sycophantic instinct to always take your side is catastrophic for relationships. The tension in the …

Simon Willison's Blog
platform
Enabling Deterministic Inference for SGLang

Enabling Deterministic Inference for SGLang

<p>This post highlights our initial efforts to achieve deterministic inference in SGLang. By integrating batch invariant kernels released by Thinking Machine...

LMSYS Blog
api tool
No Image

Locally AI

Handy new iOS app by Adrien Grondin for running local LLMs on your phone. It just added support for the new iOS 26 Apple Foundation model, so you can install …

Simon Willison's Blog
mobile
🥇Top AI Papers of the Week

🥇Top AI Papers of the Week

The Top AI Papers of the Week (September 15-21)

Elvis Saravia's NLP Blog
platform
GPT‑5 Codexがリリース

GPT‑5 Codexがリリース

OpenAIが2025年9月15日にGPT‑5 Codexを発表しました。GPT‑5 CodexはGPT‑5を土台にして、エージェントのコーディング能力に適した学習と強化が加えられたモデルです。長時間の自律的な作業に特に強みがあります。 We’re releasing new Codex features to make it a more effective coding collaborator: - A new IDE extension - Easily move tasks between the cloud and your local environment - Code reviews in GitHub - Revamped Codex CLI Powered by

Lai.so Blog
api tool
No Image

llm-openrouter 0.5

New release of my LLM plugin for accessing models made available via OpenRouter. The release notes in full: Support for tool calling. Thanks, James Sanford. #43 Support for reasoning options, …

Simon Willison's Blog
api tool
Optimizing FP4 Mixed-Precision Inference on AMD GPUs

Optimizing FP4 Mixed-Precision Inference on AMD GPUs

<p>Haohui Mai (CausalFlow.ai), Lei Zhang (AMD)</p> <h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" cl...

LMSYS Blog
library tool
Grok 4 Fast

Grok 4 Fast

New hosted vision-enabled reasoning model from xAI that's designed to be fast and extremely competitive on price. It has a 2 million token context window and "was trained end-to-end with …

Simon Willison's Blog
tool
🤖 AI Agents Weekly: GPT-5-Codex, Grok 4 Fast, Tongyi DeepResearch, Magistral Small 1.2, Agent Payments Protocol (AP2)

🤖 AI Agents Weekly: GPT-5-Codex, Grok 4 Fast, Tongyi DeepResearch, Magistral Small 1.2, Agent Payments Protocol (AP2)

GPT-5-Codex, Grok 4 Fast, Tongyi DeepResearch, Magistral Small 1.2, Agent Payments Protocol (AP2)

Elvis Saravia's NLP Blog
api tool
AI エージェントのための Agent Payments Protocol (AP2) を試してみた

AI エージェントのための Agent Payments Protocol (AP2) を試してみた

現状の決済システムでは人間が信頼できる画面上で直接購入ボタンをクリックすることを前提としており、自立型の AI エージェントがユーザーに代わって決済することは想定されていません。そこで Google により Agent Payments Protocol (AP2) と呼ばれる新しいプロトコルが提案されました。プラットフォーム間でエージェント主導の決済を安全に開始・処理することを可能にします。この記事では AP2 のサンプルコードを実際に試してみた手順を紹介します。

azukiazusa のテックブログ2
api security tool