Last updated: 2025/10/22 23:00

Where AI-assisted coding accelerates development — and where it doesn’t
John Reilly discusses how software development has been changed by the innovations of AI: both the positives and the negatives.

Living dangerously with Claude
I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I’ve been struggling …

SLOCCount in WebAssembly
This project/side-quest got a little bit out of hand. I remembered an old tool called SLOCCount which could count lines of code and produce an estimate for how much they …
Don't let Claude Code delete your session logs
Claude Code stores full logs of your sessions as newline-delimited JSON in ~/.claude/projects/encoded-directory/*.jsonl on your machine. I currently have 379MB of these! Here's an example jsonl file which I extracted …

Accelerating Hybrid Inference in SGLang with KTransformers CPU Kernels
<h2><a id="background-hybrid-inference-for-sparse-moe-models" class="anchor" href="#background-hybrid-inference-for-sparse-moe-models" aria-hidden="true"><sv...

Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers
The Brave security team wrote about prompt injection against browser agents a few months ago (here are my notes on that). Here's their follow-up: What we’ve found confirms our initial …

Introducing ChatGPT Atlas
Last year OpenAI hired Chrome engineer Darin Fisher, which sparked speculation they might have their own browser in the pipeline. Today it arrived. ChatGPT Atlas is a Mac-only web browser …

TypeScript版DSPy、axを試してみた
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Quoting Bruce Schneier and Barath Raghavan
Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include …

Claude Code for web - a new asynchronous coding agent from Anthropic
Anthropic launched Claude Code for web this morning. It’s an asynchronous coding agent—their answer to OpenAI’s Codex Cloud and Google’s Jules, and has a very similar shape. I had preview …

Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code
DeepSeek released a new model yesterday: DeepSeek-OCR, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running …

Announcing Experimental Malware Scanning for the Hugging Face Ecosystem
Socket is launching experimental protection for the Hugging Face ecosystem, scanning for malware and malicious payload injections inside model files t...

Oracle AI World 2025 参加レポート
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

🥇Top AI Papers of the Week
The Top AI Papers of the Week (October 13-19)

TIL: Exploring OpenAI's deep research API model o4-mini-deep-research
I landed a PR by Manuel Solorzano adding pricing information to llm-prices.com for OpenAI's o4-mini-deep-research and o3-deep-research models, which they released in June and document here. I realized I'd never …

🤖 AI Agents Weekly: Claude Haiku 4.5, Deep Agents, SWE-grep, nanochat, Agent Skills, Veo 3.1 Fast, n8n AI Workflow Builder
Claude Haiku 4.5, Deep Agents, SWE-grep, nanochat, Agent Skills, Veo 3.1 Fast, n8n AI Workflow Builder
The AI water issue is fake
Andy Masley (previously): All U.S. data centers (which mostly support the internet, not AI) used 200--250 million gallons of freshwater daily in 2023. The U.S. consumes approximately 132 billion gallons …
Andrej Karpathy — AGI is still a decade away
Extremely high signal 2 hour 25 minute (!) conversation between Andrej Karpathy and Dwarkesh Patel. It starts with Andrej's claim that "the year of agents" is actually more likely to …

Claude Skillsとは何なのか?
AnthropicがClaudeの新機能 Claude Skills (Agent Skills)を追加したと発表しました。Claude Skillsは、Markdownファイルとスクリプトで構成される「スキルフォルダ」を通じて、モデルに特定の機能や知識を拡張できる仕組みです。 Claude Skills: Customize AI for your workflowsBuild custom Skills to teach Claude specialized tasks. Create once, use everywhere—from spreadsheets to coding. Available across Claude.ai, API, and Code.Box logo もともとClaudeは8月にチャットアシスタントからのコード実行環境をアップデートしていました。それまでは指示に応じてPythonコードを実行しグラフ生成やデータ分析をする用途でしたが、Bashコマンドをサンドボックスで自由に実行できる環境ができていました。 Claude can now cre
Quoting Alexander Fridriksson and Jay Miller
Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit …
Quoting Barry Zhang
Skills actually came out of a prototype I built demonstrating that Claude Code is a general-purpose agent :-) It was a natural conclusion once we realized that bash + filesystem …

Claude Skills are awesome, maybe a bigger deal than MCP
Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills …

ENISA’s 2025 Threat Landscape: AI Reshapes Cyber Attacks, from Phishing to Supply Chain Abuse
ENISA’s 2025 Threat Landscape report highlights how AI is reshaping cyber attacks, driving phishing, model poisoning, and software supply chain risks.

Deep Agents
On the future of AI Agents.
NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0
EXO Labs wired a 256GB M3 Ultra Mac Studio up to an NVIDIA DGX Spark and got a 2.8x performance boost serving Llama-3.1 8B (FP16) with an 8,192 token prompt. …
Quoting Riana Pfefferkorn
Pro se litigants account for the majority of the cases in the United States where a party submitted a court filing containing AI hallucinations. In a country where legal representation …
Coding without typing the code
Last year the most useful exercise for getting a feel for how good LLMs were at writing code was vibe coding (before that name had even been coined) - seeing …
Quoting Catherine Wu
While Sonnet 4.5 remains the default [in Claude Code], Haiku 4.5 now powers the Explore subagent which can rapidly gather context on your codebase to build apps even faster. You …

Introducing Claude Haiku 4.5
Anthropic released Claude Haiku 4.5 today, the cheapest member of the Claude 4.5 family that started with Sonnet 4.5 a couple of weeks ago. It's priced at $1/million input tokens …
Quoting Claude Haiku 4.5 System Card
Previous system cards have reported results on an expanded version of our earlier agentic misalignment evaluation suite: three families of exotic scenarios meant to elicit the model to commit blackmail, …

Want to run your AI model locally? Here’s what you should know
As costs and privacy concerns grow, enterprises are shifting from cloud to local AI. Learn what it takes to run models locally, and why it matters.

LLM-as-a-Judgeにまつわるバイアスまとめ
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。

NVIDIA DGX Spark: great hardware, early days for the ecosystem
NVIDIA sent me a preview unit of their new DGX Spark desktop “AI supercomputer”. I’ve never had hardware to review before! You can consider this my first ever sponsored post …
Just Talk To It - the no-bs Way of Agentic Engineering
Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions …

AIに技術記事を書かせる:9回の反復で到達した「完璧すぎる」という逆説
この記事では、AIに技術記事を書かせる試みについて述べられています。著者は、Claude Codeを使用して、記事生成、レビュー、スタイルガイドの改善を繰り返すシステムを構築しました。最初は品質が7〜8割程度と予想していましたが、9回の反復を経て9.0/10の評価に達しました。特に、完璧すぎる記事が逆にAIらしさを感じさせるという「完璧すぎる逆説」に直面しました。システムは3つのエージェント(Writer Agent、Reviewer Agent、Style Guide Updater)で構成され、各エージェントは独立して機能します。反復を重ねる中で、メタ認知的シフトや不完全さの重要性が明らかになり、最終的には人間らしい不完全さを取り入れることで、より自然な記事が生成されるようになりました。 • AIに技術記事を書かせる試みの目的は、人間と区別できないレベルの品質を目指すこと。 • Claude Codeを使用し、記事生成、レビュー、スタイルガイド改善のサイクルを構築。 • 反復を重ねる中で、品質が向上し、最終的に9.0/10の評価を得る。 • 完璧すぎる記事が逆にAIらしさを感じさせるという課題に直面。 • システムは3つのエージェント(Writer、Reviewer、Style Guide Updater)で構成され、各エージェントは独立して機能。 • メタ認知的シフトや不完全さの重要性が明らかになり、自然な記事生成に寄与。 • 不完全さを取り入れることで、より人間らしい記事が生成されるようになった。

NVIDIA and SGLang Accelerating SemiAnalysis InferenceMAX and GB200 Together
<p>The SGLang and NVIDIA teams have a strong track record of collaboration, consistently delivering inference optimizations and system-level improvements to ...
nanochat
Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …
![AI dev tool power rankings & comparison [Oct 2025]](https://blog.logrocket.com/wp-content/uploads/2025/07/ai_dev_tool_power_rankings_july_2025_web.png)
AI dev tool power rankings & comparison [Oct 2025]
Compare the top AI development tools and models of September 2025. View updated rankings, feature breakdowns, and find the best fit for you.

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference
<p>Thanks to NVIDIA’s early access program, we are thrilled to get our hands on the NVIDIA DGX™ Spark. It’s quite an unconventional system, as NVIDIA rarely ...

🥇Top AI Papers of the Week
The Top AI Papers of the Week (October 6-12)
Claude Code sub-agents
Claude Code includes the ability to run sub-agents, where a separate agent loop with a fresh token context is dispatched to achieve a goal and report back when it's done. …
Vibing a Non-Trivial Ghostty Feature
Mitchell Hashimoto provides a comprehensive answer to the frequent demand for a detailed description of shipping a non-trivial production feature to an existing project using AI-assistance. In this case it's …

🤖 AI Agents Weekly: AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender
AgentKit, Gemini 2.5 Computer Use, State of AI Report 2025, Agentic Context Engineering, CodeMender
Note on 11th October 2025
I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code …

ChatGPT 内でアプリを直接操作する Apps SDK に自作のアプリを接続する
Apps in ChatGPT は ChatGPT のチャット内で会話の流れに応じて外部のアプリを呼び出し、インタラクティブな操作を可能にする機能です。アプリごとに独自の UI コンポーネントを提供し、ユーザーはチャット画面からシームレスな体験でアプリを操作できます。この記事では Apps SDK を使用して、実際に ChatGPT 内で動作するシンプルなアプリを作成する手順を紹介します。
simonw/claude-skills
One of the tips I picked up from Jesse Vincent's Claude Code Superpowers post (previously) was this: Skills are what give your agents Superpowers. The first time they really popped …

Superpowers: How I'm using coding agents in October 2025
A follow-up to Jesse Vincent's post about September, but this is a really significant piece in its own right. Jesse is one of the most creative users of coding agents …
A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises
Filippo Valsorda surveyed 18 incidents from the past year of open source supply chain attacks, where package updates were infected with malware thanks to a compromise of the project itself. …
Video of GPT-OSS 20B running on a phone
GPT-OSS 20B is a very good model. At launch OpenAI claimed: The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with …

AIエージェントにおけるコンテキスト圧縮手法の評価 (AI Shiftインターン体験記)
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Quoting Gergely Orosz
I get a feeling that working with multiple AI agents is something that comes VERY natural to most senior+ engineers or tech lead who worked at a large company You …

LangChain.js is overrated; Build your AI agent with a simple fetch call
Skip the LangChain.js overhead: How to build a Retrieval-Augmented Generation (RAG) AI agent from scratch using just the native `fetch()` API.

Deepgram Fluxを使ったターンテイキング認識の実験
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Claude can write complete Datasette plugins now
This isn’t necessarily surprising, but it’s worth noting anyway. Claude Sonnet 4.5 is capable of building a full Datasette plugin now. I’ve seen models complete aspects of this in the …
Quoting Simon Højberg
The cognitive debt of LLM-laden coding extends beyond disengagement of our craft. We’ve all heard the stories. Hyped up, vibed up, slop-jockeys with attention spans shorter than the framework-hopping JavaScript …

Goodbye, messy data: An engineer’s guide to scalable data enrichment
Walk through building a data enrichment workflow that moves beyond simple lead gen to become a powerful internal tool for enterprises.

Gemini 2.5 Computer Use can solve Google's own CAPTCHAs
Google just introduced a new Gemini 2.5 Computer Use model, specially designed to help operate a GUI interface by interacting with visible elements using a virtual mouse and keyboard. I …
Vibe engineering
I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to …

DesignCoder and the future of AI-generated UI
Explore DesignCoder, a hierarchy-aware and self-correcting approach to AI-generated UI, and what it means for frontend devs and enterprises.
Deloitte to pay money back to Albanese government after using AI in $440,000 report
Ouch: Deloitte will provide a partial refund to the federal government over a $440,000 report that contained several errors, after admitting it used generative artificial intelligence to help produce it. …
a system that can do work independently on behalf of the user
I've settled on agents as meaning "LLMs calling tools in a loop to achieve a goal" but OpenAI continue to muddy the waters with much more vague definitions. Swyx spotted …

gpt-image-1-mini
OpenAI released a new image model today: gpt-image-1-mini, which they describe as "A smaller image generation model that’s 80% less expensive than the large model." They released it very quietly …

GPT-5 pro
Here's OpenAI's model documentation for their GPT-5 pro model, released to their API today at their DevDay event. It has similar base characteristics to GPT-5: both share a September 30, …

OpenAI DevDay 2025 発表まとめ
OpenAI DevDay 2025がサンフランシスコで開催され、様々な新機能が発表された。主な内容には、ChatGPT内で使用できるアプリ機能を提供するApps SDKのプレビュー版が含まれ、開発者は8億人以上のChatGPTユーザーにリーチできる。初期パートナーにはBooking.comやCanvaなどが名を連ね、年末にはアプリ機能の審査が開始される予定。また、Codexが正式リリースされ、Slackとの統合機能や管理ツールが追加された。さらに、GPT-5のAPIリクエストが40%高速化され、Sora 2のAPI対応や新しい画像生成モデルも発表された。OpenAIのクックブックには、プロンプトのレジリエンスを担保するための評価フライホイールのガイドが追加された。 • OpenAI DevDay 2025で新機能が発表された。 • Apps SDKにより、ChatGPT内でアプリ機能が利用可能になる。 • 初期パートナー企業としてBooking.comやCanvaが参加。 • Codexが正式リリースされ、Slackとの統合機能が追加された。 • GPT-5のAPIリクエストが40%高速化される。 • Sora 2のAPI対応や新しい画像生成モデルが発表された。 • OpenAIのクックブックにプロンプトのレジリエンスを担保するガイドが追加された。
OpenAI DevDay 2025 live blog
I’m at OpenAI DevDay in Fort Mason, San Francisco today. As I did last year, I’m going to be live blogging the announcements from the kenote. Unlike last year, this …

🥇Top AI Papers of the Week
The Top AI Papers of the Week (September 29 - October 5)
Embracing the parallel coding agent lifestyle
For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in …

Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines
I've had trouble getting my head around DSPy in the past. This half hour talk by Drew Breunig at the recent Databricks Data + AI Summit is the clearest explanation …

🤖 AI Agents Weekly: Claude Agent SDK, Sora 2, Claude Sonnet 4.5, Microsoft Agent Framework, GLM-4.6, Agentic Commerce Protocol
Claude Agent SDK, Sora 2, Claude Sonnet 4.5, Microsoft Agent Framework, GLM-4.6, Agentic Commerce Protocol

MCP のツールアノテーションでユーザーにヒントを提供する
MCP ではツールアノテーションを使用して、ユーザーにツールの動作に関するヒントを提供できます。例えば `readOnlyHint` を設定することで、ツールがデータを変更しないことを示すことができます。この記事では TypeScript SDK を使用して MCP サーバーでツールアノテーションを設定し、Claude Code クライアントでどのように表示されるかを確認します。

DeepSeek-V3.2-Expがリリース:コスト効率を大幅に改善したアップデート
DeepSeekは新バージョン DeepSeek-V3.2-Exp を発表しました。このモデルは、直前のV3.1-Terminusをベースに、DeepSeek Sparse Attention (DSA) と呼ばれるDeepSeek独自のSparse Attentionを導入してコスト効率を向上しています。 GitHub - deepseek-ai/DeepSeek-V3.2-ExpContribute to deepseek-ai/DeepSeek-V3.2-Exp development by creating an account on GitHub.GitHubdeepseek-ai 特徴 DeepSeek-V3.2-ExpのSparse Attentionは入力トークンの一部だけに注意を向ける仕組みで、入力長が増えるほど計算量削減の効果が大きくなります。 Transformerアーキテクチャは入力が長くなると必要な計算が二乗に比例して増える仕組みでしたが、DSAでは入力されたトークンを内部でインデックス化し、関連度を素早く見積もることで対象を絞り込み効率化します。
Sora 2 prompt injection
It turns out Sora 2 is vulnerable to prompt injection! When you onboard to Sora you get the option to create your own "cameo" - a virtual video recreation of …

Daniel Stenberg's note on AI assisted curl bug reports
Curl maintainer Daniel Stenberg on Mastodon: Joshua Rogers sent us a massive list of potential issues in #curl that he found using his set of AI assisted tools. Code analyzer …
Quoting Nadia Eghbal
When attention is being appropriated, producers need to weigh the costs and benefits of the transaction. To assess whether the appropriation of attention is net-positive, it’s useful to distinguish between …

aavetis/PRarena
Albert Avetisian runs this repository on GitHub which uses the Github Search API to track the number of PRs that can be credited to a collection of different coding agents. …

Two more Chinese pelicans
Two new models from Chinese AI labs in the past few days. I tried them both out using llm-openrouter: DeepSeek-V3.2-Exp from DeepSeek. Announcement, Tech Report, Hugging Face (690GB, MIT license). …

Animals vs Ghosts
Today's frontier LLM research is not about building animals. It is about summoning ghosts. And a bit more on Sutton's Dwarkesh pod.

A spec-first workflow for building with agentic AI
Andrew Evans gives his take on agentic AI and walks through a step-by-step method to build a spec-first workflow using Claude Code.
September monthly sponsors newsletter
I just sent out the September edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access a copy here. …

Sora 2 発表関連情報まとめ
OpenAIがSora 2を発表し、動画生成サービスを提供開始しました。Sora 2は、ChatGPT Proプランの契約が必要で、現在はアメリカとカナダでのみ利用可能です。新しいiOSアプリSoraでは、ユーザーが動画を生成し、他のユーザーのコンテンツをリミックスすることができます。特に「カメオ機能」により、自分や友人を動画に出演させることが可能です。Sora 2は、物理法則に基づいた自然な動きやフォトリアルな表現ができ、音声や効果音の生成も行えます。安全性を重視し、生成動画にはトラッキング可能なウォーターマークが付与され、ユーザーの健康状態を確認する機能やペアレンタルコントロール機能も搭載されています。今後はAPI経由での提供も予定されています。 • OpenAIがSora 2を発表し、動画生成サービスを開始した。 • Sora 2を利用するにはChatGPT Proプランの契約が必要で、現在はアメリカとカナダでのみ使用可能。 • iOSアプリSoraでは、動画生成や他のユーザーのコンテンツのリミックスが可能。 • 「カメオ機能」により、自分や友人を動画に出演させることができる。 • Sora 2は物理法則に基づいた自然な動きやフォトリアルな表現が可能で、音声や効果音の生成も行える。 • 安全性を重視し、生成動画にはトラッキング可能なウォーターマークが付与されている。 • ユーザーの健康状態を確認する機能やペアレンタルコントロール機能も搭載。 • 今後はAPI経由での提供も予定されている。
Sora 2
Having watched this morning's Sora 2 introduction video, the most notable feature (aside from audio generation - original Sora was silent, Google's Veo 3 supported audio in May 2025) looks …
Designing agentic loops
Coding agents like Anthropic’s Claude Code and OpenAI’s Codex CLI represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly …

【今日の話題】Sonnet 4.5、Cursorブラウザツール、Instant Checkout
Claude Sonnet 4.5 がリリース Introducing Claude Sonnet 4.5Claude Sonnet 4.5 is the best coding model in the world, strongest model for building complex agents, and best model at using computers.logo * 「最強のコーディングモデル」として発表され、30時間以上の自律コーディングを達成したとの報告。 * SWE-bench Verified で 77.2%(並列実行/Best for N方式では82%、)の課題解決率を記録し、長時間安定して計画を維持できる。 * 一方で「GPT-5

Claude Sonnet 4.5 発表関連情報まとめ
Claude Sonnet 4.5が発表され、あらゆるプラットフォームで利用可能になった。新モデルは、複雑なエージェントの構築やコンピュータ操作、リーズニング、数学タスクにおいて大幅な性能向上を実現し、30時間を超える複雑なタスクを遂行できる。チェックポイント機能が追加され、作業の進捗状況を保管・ロールバック可能になった。安全性の学習により、ユーザの指示に過度に従ったり虚偽の回答をするリスクが低減され、プロンプトインジェクション攻撃に対する防御性能も強化された。Claude Agent SDKは、コーディング以外の幅広いタスクに対応する汎用エージェントの構築を可能にし、エージェントループを用いた動作が特徴。 • Claude Sonnet 4.5は複雑なエージェントの構築やコンピュータ操作において性能向上を実現した。 • 新たにチェックポイント機能が追加され、作業の進捗状況を保管・ロールバックできる。 • 安全性の学習により、ユーザの指示に過度に従うリスクが低減された。 • プロンプトインジェクション攻撃に対する防御性能が強化された。 • Claude Agent SDKはコーディング以外のタスクにも対応する汎用エージェントの構築を可能にする。

Claude Sonnet 4.5 is probably the "best coding model in the world" (at least for now)
Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims: Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building …
Armin Ronacher: 90%
The idea of AI writing "90% of the code" to-date has mostly been expressed by people who sell AI tooling. Over the last few months, I've increasingly seen the same …
Quoting Scott Aaronson
Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked …

SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention
<p>We are excited to announce that <strong>SGLang supports DeepSeek-V3.2 on Day 0</strong>! According to the DeepSeek <a href="https://github.com/deepseek-ai...
Quoting Nick Turley
We’ve seen the strong reactions to 4o responses and want to explain what is happening. We’ve started testing a new safety routing system in ChatGPT. As we previously mentioned, when …

🥇Top AI Papers of the Week
The Top AI Papers of the Week (September 22-28)
Codex vs Claude Code: which is the better AI coding agent?
A practical look at Codex vs Claude Code: agents, model choices, costs, and the workflows they enable in real projects.

PD-Multiplexing: Unlocking High-Goodput LLM Serving with GreenContext
<p>This post highlights our initial efforts to support <strong>a new serving paradigm, PD-Multiplexing, in</strong> <strong>SGLang.</strong> It is designed t...

Video models are zero-shot learners and reasoners
Fascinating new paper from Google DeepMind which makes a very convincing case that their Veo 3 model - and generative video models in general - serve a similar role in …

🤖 AI Agents Weekly: Code World Model, Gemini Robotics-ER 1.5, Figma MCP server, Overhearing LLM Agents, Qwen3-Max, Gamma API
Code World Model, Gemini Robotics-ER 1.5, Figma MCP server, Overhearing LLM Agents, Qwen3-Max, Gamma API

GitHub Copilot CLIがリリース
2025年9月25日、GitHubが「GitHub Copilot CLI」をパブリックプレビューとして公開しました。 GitHub Copilot CLI is now in public preview - GitHub ChangelogGitHub Copilot CLI is now in public preview We’re bringing the power of GitHub Copilot coding agent directly to your terminal. With GitHub Copilot CLI, you can work locally and…The GitHub BlogAllison

Chrome DevTools MCP で AI エージェントのフロントエンド開発をサポートする
自律的な AI エージェントを利用したコーディングでは、生成したコードを実行した結果からフィードバックを得て、コードを改善していく反復的なプロセスが重要です。しかし、フロントエンド開発では、生成したコードはブラウザ上で実行されるため、AI エージェントが直接コードを実行したり、ブラウザのコンソールログを取得したりすることは困難です。Chrome DevTools MCP はこの課題を解決するためのツールです。
ForcedLeak: AI Agent risks exposed in Salesforce AgentForce
Classic lethal trifecta image exfiltration bug reported against Salesforce AgentForce by Sasi Levi and Noma Security. Here the malicious instructions come in via the Salesforce Web-to-Lead feature. When a Salesforce …
How to stop AI’s “lethal trifecta”
This is the second mention of the lethal trifecta in the Economist in just the last week! Their earlier coverage was Why AI systems may never be secure on September …

YANS2025 参加報告
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Together with SGLang: Best Practices for Serving DeepSeek-R1 on H20-96G
<h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1...
GitHub Copilot CLI is now in public preview
GitHub now have their own entry in the coding terminal CLI agent space: Copilot CLI. It's the same basic shape as Claude Code, Codex CLI, Gemini CLI and a growing …

Improved Gemini 2.5 Flash and Flash-Lite
Two new preview models from Google - updates to their fast and inexpensive Flash and Flash Lite families: The latest version of Gemini 2.5 Flash-Lite was trained and built based …
Don't hide your best documentation
If you hide the system prompt and tool descriptions for your LLM agent, what you're actually doing is deliberately hiding the most useful documentation describing your service from your most …

Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput
<p>The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress to optimize the inference performance of ...
Quoting Stanford CS221 Autumn 2025
[2 points] Learn basic NumPy operations with an AI tutor! Use an AI chatbot (e.g., ChatGPT, Claude, Gemini, or Stanford AI Playground) to teach yourself how to do basic vector …
Cross-Agent Privilege Escalation: When Agents Free Each Other
Here's a clever new form of AI exploit from Johann Rehberger, who has coined the term Cross-Agent Privilege Escalation to describe an attack where multiple coding agents - GitHub Copilot …

6 easy ways to level up Claude Code
Walk through six tips and tricks that help you level up Claude Code to move beyond simply entering prompts into a text box.

GPT-5-Codex
OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed …
Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action
I've been looking forward to this. Qwen 2.5 VL is one of the best available open weight vision LLMs, so I had high hopes for Qwen 3's vision models. Firstly, …

YAML ファイルで AI エージェントを構築する cagent
cagent は Docker 社が開発した AI エージェントフレームワークです。YAML ファイルでエージェントの振る舞い・役割・使用するツールを宣言的に定義でき、コードを 1 行も書かずにエージェントを構築できます。この記事では cagent の概要とインストール方法、YAML ファイルの書き方、実際にエージェントを動作させるまでの手順を解説します。
Why AI systems might never be secure
The Economist have a new piece out about LLM security, with this headline and subtitle: Why AI systems might never be secure A “lethal trifecta” of conditions opens them to …
Quoting Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano and Jeffrey T. Hancock
We define workslop as AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task. Here’s how this happens. As AI tools …

Four new releases from Qwen
It's been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements): Qwen3-Next-80B-A3B-Instruct-FP8 and …

CompileBench: Can AI Compile 22-year-old Code?
Interesting new LLM benchmark from Piotr Grabowski and Piotr Migdał: how well can different models handle compilation challenges such as cross-compiling gucr for ARM64 architecture? This is one of my …
ChatGPT Is Blowing Up Marriages as Spouses Use AI to Attack Their Partners
Maggie Harrison Dupré for Futurism. It turns out having an always-available "marriage therapist" with a sycophantic instinct to always take your side is catastrophic for relationships. The tension in the …

Enabling Deterministic Inference for SGLang
<p>This post highlights our initial efforts to achieve deterministic inference in SGLang. By integrating batch invariant kernels released by Thinking Machine...
Locally AI
Handy new iOS app by Adrien Grondin for running local LLMs on your phone. It just added support for the new iOS 26 Apple Foundation model, so you can install …

🥇Top AI Papers of the Week
The Top AI Papers of the Week (September 15-21)

GPT‑5 Codexがリリース
OpenAIが2025年9月15日にGPT‑5 Codexを発表しました。GPT‑5 CodexはGPT‑5を土台にして、エージェントのコーディング能力に適した学習と強化が加えられたモデルです。長時間の自律的な作業に特に強みがあります。 We’re releasing new Codex features to make it a more effective coding collaborator: - A new IDE extension - Easily move tasks between the cloud and your local environment - Code reviews in GitHub - Revamped Codex CLI Powered by
llm-openrouter 0.5
New release of my LLM plugin for accessing models made available via OpenRouter. The release notes in full: Support for tool calling. Thanks, James Sanford. #43 Support for reasoning options, …

Optimizing FP4 Mixed-Precision Inference on AMD GPUs
<p>Haohui Mai (CausalFlow.ai), Lei Zhang (AMD)</p> <h2><a id="introduction" class="anchor" href="#introduction" aria-hidden="true"><svg aria-hidden="true" cl...

Grok 4 Fast
New hosted vision-enabled reasoning model from xAI that's designed to be fast and extremely competitive on price. It has a 2 million token context window and "was trained end-to-end with …

🤖 AI Agents Weekly: GPT-5-Codex, Grok 4 Fast, Tongyi DeepResearch, Magistral Small 1.2, Agent Payments Protocol (AP2)
GPT-5-Codex, Grok 4 Fast, Tongyi DeepResearch, Magistral Small 1.2, Agent Payments Protocol (AP2)

AI エージェントのための Agent Payments Protocol (AP2) を試してみた
現状の決済システムでは人間が信頼できる画面上で直接購入ボタンをクリックすることを前提としており、自立型の AI エージェントがユーザーに代わって決済することは想定されていません。そこで Google により Agent Payments Protocol (AP2) と呼ばれる新しいプロトコルが提案されました。プラットフォーム間でエージェント主導の決済を安全に開始・処理することを可能にします。この記事では AP2 のサンプルコードを実際に試してみた手順を紹介します。
Magistral 1.2
Mistral quietly released two new models yesterday: Magistral Small 1.2 (Apache 2.0, 96.1 GB on Hugging Face) and Magistral Medium 1.2 (not open weights same as Mistral's other "medium" models.) …
The Hidden Risk in Notion 3.0 AI Agents: Web Search Tool Abuse for Data Exfiltration
Abi Raghuram reports that Notion 3.0, released yesterday, introduces new prompt injection data exfiltration vulnerabilities thanks to enabling lethal trifecta attacks. Abi's attack involves a PDF with hidden text (white …

Environment-aware model routing: Build smarter AI apps with AI SDK
Discover a handy pattern for routing LLM calls in an “environment-aware” manner, using AI SDK’s middleware.
Quoting Steve Jobs
Well, the types of computers we have today are tools. They’re responders: you ask a computer to do something and it will do it. The next stage is going to …

I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now
I’ve noticed something interesting over the past few weeks: I’ve started using the term “agent” in conversations where I don’t feel the need to then define it, roll my eyes …
Anthropic: A postmortem of three recent issues
Anthropic had a very bad month in terms of model reliability: Between August and early September, three infrastructure bugs intermittently degraded Claude's response quality. We've now resolved these issues and …
ICPC medals for OpenAI and Gemini
In July it was the International Math Olympiad (OpenAI, Gemini), today it's the International Collegiate Programming Contest (ICPC). Once again, both OpenAI and Gemini competed with models that achieved Gold …

How to stop your AI agents from hallucinating: A guide to n8n’s Eval Node
Walk through a practical example of n8n's Eval feature, which helps developers reduce hallucinations and increase reliability of AI products.
Announcing the 2025 PSF Board Election Results!
I'm happy to share that I've been re-elected for second term on the board of directors of the Python Software Foundation. Jannis Leidel was also re-elected and Abigail Dogbe and …

Let’s kill vibe coding and bring back prompt engineering
Vibe coding is trending, but is it sustainable? Explore why prompt engineering still matters for building reliable, high-quality AI apps.

openai/codex でのプロジェクト固有MCPを設定する
この記事では、OpenAIのCodexを使用してプロジェクト固有のMCP(Model Context Protocol)を設定する方法について説明しています。CodexはグローバルにMCPを設定することしかできないため、プロジェクトごとに独立した設定が必要です。2つの手段が提案されており、1つ目は環境変数を使用して読み込みディレクトリを変更し、プロジェクト固有の設定をロードする方法です。しかし、この方法では認証情報が含まれるため、普段使いには適していません。2つ目の手段は、Codexのコマンドラインオプションを使用して直接TOML設定をロードする方法で、こちらの方が安全です。具体的なコマンドや設定例も示されており、実装に関する注意点も記載されています。 • CodexはグローバルにMCPを設定するが、プロジェクトごとに独立した設定が必要な場合がある。 • 手段1では環境変数を使用してプロジェクト固有のMCP設定をロードできるが、認証情報が含まれるため普段使いには不向き。 • 手段2では--configオプションを使用して直接TOMLをロードする方法があり、こちらが安全とされる。 • 具体的なコマンドや設定例が示されており、実装方法が詳細に説明されている。 • JSONからTOMLへの変換に関する注意点も記載されている。

GPT‑5-Codex and upgrades to Codex
OpenAI half-released a new model today: GPT‑5-Codex, a fine-tuned GPT-5 variant explicitly designed for their various AI-assisted programming tools. I say half-released because it's not yet available via their API, …
Models can prompt now
Here's an interesting example of models incrementally improving over time: I am finding that today's leading models are competent at writing prompts for themselves and each other. A year ago …

🥇Top AI Papers of the Week
The Top AI Papers of the Week (September 8-14)

メインブラウザをEdgeに切り替えた理由とAIブラウザの可能性
ChromeからEdgeに乗り換え 最近、筆者はAI統合型のブラウザを常用するべくメインブラウザをGoogle ChromeからMicrosoft Edgeに切り替えました。EdgeのCopilot Modeは8月にGPT-5が搭載され、かなり使い勝手が良くなりました。2年前にこの前哨戦となる「Bing AIチャットをデフォルトのウェブ検索にして使ってみた」を投稿したのですが、当時と比べると雲泥の差です。 この記事では、筆者がEdgeへの移行を検討するに至った背景や、実際の使用感について整理しました。また、AIブラウザの台頭に伴い、セキュリティ面での新たなリスクについても考えることになったのでそれを喚起します。 移行の動機 筆者がメインブラウザをChromeからEdgeに移行した最大の理由は、AI統合型のウェブブラウジングを日常にしたかったからでした。実は2年前にもプログラミングにAI機能を使いたいという理由で、エディタをJetBrainsから強制的にVSCode/Cursorに移行した経験があり、それを思い出します。 現在、ブラウザやOSとLLMの統合は急速に進んでいます

🤖 AI Agents Weekly: Agent 3, ChatGPT Developer Mode, MCP Registry, Writing Effective Tools for Agents, Qodo Aware
Agent 3, ChatGPT Developer Mode, MCP Registry, Writing Effective Tools for Agents, Qodo Aware

自然言語で CI/CD パイプラインを定義する Agentic Workflows
Agentic Workflows は自然言語で CI/CD パイプラインを定義できるツールとして GitHub Next が開発しています。自然言語で定義されたワークフローは GitHub CLI の拡張機能として提供される gh aw コマンドでコンパイルして実行できます。これは継続体なAI(Continuous AI)を実現するためのツールです。
gpt-5 and gpt-5-mini rate limit updates
OpenAI have increased the rate limits for their two main GPT-5 models. These look significant: gpt-5 Tier 1: 30K → 500K TPM (1.5M batch) Tier 2: 450K → 1M (3M …
Quoting Matt Webb
The trick with Claude Code is to give it large, but not too large, extremely well defined problems. (If the problems are too large then you are now vibe coding… …

今週の話題:Claudeの劣化問題の修正、Claude Code API差し替え、sonoma-alpha
AnthropicがClaudeの性能劣化に対応 Anthropicが公式に、8月からコミュニティで報告されていたClaude Sonnetの性能劣化を修正したと発表しました。原因は推論スタックのインフラ層にあり、独立したバグによるものであり「モデル本体の意図的な性能ダウン」や「需要対策によるダウングレード」は否定されています。 Model output qualityAnthropic’s Status Page - Model output quality.Model output quality 発表には、2025年8月下旬〜9月初旬にかけてSonnet 4系で品質劣化(degraded output quality)が発生し、8月5日〜9月4日には少数のSonnet 4.0リクエストに出力品質の低下が見られたという記載があります。Opus 4.1にはいまだ未解決の問題もあります。 8月中にはRedditでClaude Codeの応答劣化の件は炎上していました。有料プランの週次制限の開始あたりから加熱した印象です。一部ではCodex CLIに乗り換えようという声がありまし
Comparing the memory implementations of Claude and ChatGPT
Shlok Khemani has been doing excellent work reverse-engineering LLM systems and documenting his discoveries. Last week he wrote about ChatGPT memory. This week it's Claude. Claude's memory system has two …

Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!
Qwen announced two new models via their Twitter account (nothing on their blog yet): Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking. They make some big claims on performance: Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship. Qwen3-Next-80B-A3B-Thinking …
Defeating Nondeterminism in LLM Inference
A very common question I see about LLMs concerns why they can't be made to deliver the same response to the same prompt by setting a fixed random number seed. …
Claude API: Web fetch tool
New in the Claude API: if you pass the web-fetch-2025-09-10 beta header you can add {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5} to your "tools" list and Claude will gain the …

What you actually need to build and ship AI-powered apps in 2025
Discover what you actually need to build and ship AI-powered apps in 2025, with tips for which tools to choose and how to implement them.
I Replaced Animal Crossing's Dialogue with a Live LLM by Hacking GameCube Memory
Brilliant retro-gaming project by Josh Fonseca, who figured out how to run 2002 Game Cube Animal Crossing in the Dolphin Emulator such that dialog with the characters was instead generated …
![AI dev tool power rankings & comparison [Sept 2025]](https://blog.logrocket.com/wp-content/uploads/2025/07/ai_dev_tool_power_rankings_july_2025_web.png)
AI dev tool power rankings & comparison [Sept 2025]
Compare the top AI development tools and models of September 2025. View updated rankings, feature breakdowns, and find the best fit for you.

SGLang HiCache: Fast Hierarchical KV Caching with Your Favorite Storage Backends
<h2><a id="from-the-community" class="anchor" href="#from-the-community" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" ...
Quoting Apple Security Engineering and Architecture
There has never been a successful, widespread malware attack against iPhone. The only system-level iOS attacks we observe in the wild come from mercenary spyware, which is vastly more complex …

My review of Claude's new Code Interpreter, released under a very confusing name
Today on the Anthropic blog: Claude can now create and edit files: Claude can now create and edit Excel spreadsheets, documents, PowerPoint slide decks, and PDFs directly in Claude.ai and …

MCP is replacing the browser: Here’s how devs should prepare
Learn how MCP will replace the traditional browser, what this shift means for frontend devs, and how to start prepping for an AI-first future.
The 2025 PSF Board Election is Open!
The Python Software Foundation's annual board member election is taking place right now, with votes (from previously affirmed voting members) accepted from September 2nd, 2:00 pm UTC through Tuesday, September …
Geoffrey Huntley is cursed
Geoffrey Huntley vibe-coded an entirely new programming language using Claude: The programming language is called "cursed". It's cursed in its lexical structure, it's cursed in how it was built, it's …
Improve your AI code output with AGENTS.md (+ my best tips)
Stop re-prompting. Put the rules in AGENTS.md: do and don’ts, file-level tests, and real examples so agents ship code that matches your project.

Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide
Apollo Global Management’s “Chief Economist” Dr. Torsten Sløk released this interesting chart which appears to show a slowdown in AI adoption rates among large (>250 empoloyees) companies: Here’s the full …
Anthropic status: Model output quality
Anthropic previously reported model serving bugs that affected Claude Opus 4 and 4.1 for 56.5 hours. They've now fixed additional bugs affecting "a small percentage" of Sonnet 4 requests for …
Quoting TheSoftwareGuy
Having worked inside AWS I can tell you one big reason [that they don't document their internals] is the attitude/fear that anything we put in out public docs may end …
Load Llama-3.2 WebGPU in your browser from a local folder
Inspired by a comment on Hacker News I decided to see if it was possible to modify the transformers.js-examples/tree/main/llama-3.2-webgpu Llama 3.2 chat demo (online here, I wrote about it last …
Quoting James Luan
I recently spoke with the CTO of a popular AI note-taking app who told me something surprising: they spend twice as much on vector search as they do on OpenAI …
Is the LLM response wrong, or have you just failed to iterate it?
More from Mike Caulfield (see also the SIFT method). He starts with a fantastic example of Google's AI mode usually correctly handling a common piece of misinformation but occasionally falling …
Quoting Anil Dash
I agree with the intellectual substance of virtually every common critique of AI. And it's very clear that turning those critiques into a competition about who can frame them in …
The SIFT method
The SIFT method is "an evaluation strategy developed by digital literacy expert, Mike Caulfield, to help determine whether online content can be trusted for credible or reliable sources of information." …

🥇Top AI Papers of the Week
The Top AI Papers of the Week (September 1-7)

AI mode is good, actually
When I wrote about how good ChatGPT with GPT-5 is at search yesterday I nearly added a note about how comparatively disappointing Google's efforts around this are. I'm glad I …

仕様駆動開発を支える Spec Kit を試してみた
仕様駆動開発(Specification-Driven Development, SDD)は、AI コーディングエージェントを活用した新しいソフトウェア開発スタイルです。GitHub が提供する Spec Kit は、仕様駆動開発を支援するためのツールキットであり、AI との対話を通じて正確な受け入れ基準の定義とコード生成を支援します。この記事では Spec Kit を使用して仕様駆動開発を試してみます。

GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search
“Don’t use chatbots as search engines” was great advice for several years... until it wasn’t. I wrote about how good OpenAI’s o3 was at using its Bing-backed search tool back …
Quoting Jason Liu
I am once again shocked at how much better image retrieval performance you can get if you embed highly opinionated summaries of an image, a summary that came out of …

Kimi-K2-Instruct-0905
New not-quite-MIT licensed model from Chinese Moonshot AI, a follow-up to the highly regarded Kimi-K2 model they released in July. This one is an incremental improvement - I've seen it …

🤖 AI Agents Weekly: Universal Deep Research, GPT-4b micro, Self-Evolving Agents, Tracking Multi-Agent Failures
Universal Deep Research, GPT-4b micro, Self-Evolving Agents, Tracking Multi-Agent Failures
Anthropic to pay $1.5 billion to authors in landmark AI settlement
I wrote about the details of this case when it was found that Anthropic's training on book content was fair use, but they needed to have purchased individual copies of …

TypeScriptファーストなコーディングAIエージェントのベンチマーク「ts-bench」を公開しました
AIコーディングエージェントのTypeScriptコード編集能力を評価するための、手軽に再現可能なベンチマークプロジェクト「ts-bench」を公開しました。この記事では、筆者がなぜ ts-bench を作ったのか、今後どうしていきたいかについてお話しします。 GitHub - laiso/ts-benchContribute to laiso/ts-bench development by creating an account on GitHub.GitHublaiso ts-benchの仕組み ts-benchは、プログラミング学習プラットフォーム Exercism のTypeScript問題セットを利用します。各問題には、仕様を説明するドキュメント、エージェントが編集すべきソースコードのひな形、そして正解判定に使うテストコードが含まれています。 ベンチマークタスクは、各問題に対して以下の4つのステップを順番に実行します。 1. AIエージェントの実行: 問題の指示書をプロンプトとしてAIエージェントに渡し、ソースコードを編集させます。 2. テストファイルの復元

Introducing EmbeddingGemma
Brand new open weights (under the slightly janky Gemma license) 308M parameter embedding model from Google: Based on the Gemma 3 architecture, EmbeddingGemma is trained on 100+ languages and is …
Highlighted tools
Any time I share my collection of tools built using vibe coding and AI-assisted development (now at 124, here's the definitive list) someone will inevitably complain that they're mostly trivial. …

Beyond Vibe Coding
Back in May I wrote Two publishers and three authors fail to understand what “vibe coding” means where I called out the authors of two forthcoming books on "vibe coding" …

AI coding tools still suck at context — here’s how to work around it
Discover why you might be having difficulty with AI coding tools, and learn some practical strategies to work with AI more effectively.
gov.uscourts.dcd.223205.1436.0_1.pdf
Here's the 230 page PDF ruling on the 2023 United States v. Google LLC federal antitrust case - the case that could have resulted in Google selling off Chrome and …

AGENTS.md Gains Traction as an Open Format for AI Coding Agents
AGENTS.md is a fast-growing open format giving AI coding agents a shared, predictable way to understand project setup, style, and workflows.
Cursor vs Claude Code: The Ultimate Comparison Guide
Cursor or Claude Code? Both start at $20/mo but work differently. Compare features, hidden costs, and real workflows to pick the right AI coding tool.

Rich Pixels
Neat Python library by Darren Burns adding pixel image support to the Rich terminal library, using tricks to render an image using full or half-height colored blocks. Here's the key …
August 2025 newsletter
I just sent out my August 2025 sponsors-only newsletter summarizing the past month in LLMs and my other work. Topics included GPT-5, gpt-oss, image editing models (Qwen-Image-Edit and Gemini Nano …
Introducing gpt-realtime
Released a few days ago (August 28th), gpt-realtime is OpenAI's new "most advanced speech-to-speech model". It looks like this is a replacement for the older gpt-4o-realtime-preview model that was released …

Cloudflare Radar: AI Insights
Cloudflare launched this dashboard back in February, incorporating traffic analysis from Cloudflare's network along with insights from their popular 1.1.1.1 DNS service. I found this chart particularly interesting, showing which …

LongCat-Flash: Deploying Meituan's Agentic Model with SGLang
<h3><a id="1-introduction-deploying-meituans-agentic-open-source-moe-model" class="anchor" href="#1-introduction-deploying-meituans-agentic-open-source-moe-m...

🥇Top AI Papers of the Week
The Top AI Papers of the Week (August 25-31)

エンティティリンキングの性能改善のための効果的な絞り込み手法の検証
AI ShiftのTECH BLOGです。AI技術の情報や活用方法などをご案内いたします。
Claude Opus 4.1 and Opus 4 degraded quality
Notable because often when people complain of degraded model quality it turns out to be unfounded - Anthropic in the past have emphasized that they don't change the model weights …

🤖 AI Agents Weekly: Gemini 2.5 Flash Image, gpt-realtime, Anemoi Agent, Fine-tuning LLM Agents, Codex Updates, Agent Client Protocol
Gemini 2.5 Flash Image, gpt-realtime, Anemoi Agent, Fine-tuning LLM Agents, Codex Updates, Agent Client Protocol
Quoting Benj Edwards
LLMs are intelligence without agency—what we might call "vox sine persona": voice without person. Not the voice of someone, not even the collective voice of many someones, but a voice …

AI コーディングエージェントの管理を行う Vibe Kanban を試してみた
Vibe Kanban は、AI コーディングエージェントの管理を支援するためのツールです。カンバン方式の UI でタスク管理を行い、各タスクに対して AI エージェントを割り当てて人間がその進捗を管理できます。この記事では Vibe Kanban を使用して AI コーディングエージェントの管理を実際に試してみます。

The perils of vibe coding
I was interviewed by Elaine Moore for this opinion piece in the Financial Times, which ended up in the print edition of the paper too! I picked up a copy …

How to build a multimodal AI app with voice and vision in Next.js
Learn how to build multimodal AI interactions to process images, audio, and even real-time video streams, using Next.js and Gemini.
Lossy encyclopedia
Since I love collecting questionable analogies for LLMs, here's a new one I just came up with: an LLM is a lossy encyclopedia. They have a huge array of facts …
Python: The Documentary
New documentary about the origins of the Python programming language - 84 minutes long, built around extensive interviews with Guido van Rossum and others who were there at the start …

I tried out Kiro: Here’s what I learned
Check out Kiro, AWS's AI-powered IDE, see what makes it different from other AI coding tools, and explore whether it lives up to the hype.

Finetune and deploy GPT-OSS in MXFP4: ModelOpt+SGLang
<p>GPT-OSS, the first open-source model family from OpenAI's lab since GPT-2, demonstrates strong math, coding, and general capabilities even when compared w...
Quoting Bruce Schneier
We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks. Any AI that is working in an adversarial environment—and …