Vol. I · No. 18THU, MAY 7, 2026
Topic

Agents

Every story matching this topic across titles and summaries, newest first.

New

OpenClaw and Claude can put your AI-generated podcasts in Spotify

Save to Spotify is a new command-line tool designed specifically for AI agents like OpenClaw, Claude Code, or OpenAI Codex. If you're the kind of person who collects research on a topic, then feeds it through their AI of choice to create audio summaries and personal podcasts, this lets you save them right alongside the latest episode of The Vergecast and Welcome to Night Vale on Spotify. To set it up, you need to download and install the Save to Spotify CLI from GitHub. Then you just prompt your AI agent as normal, but tack on "and save to Spotify," and it should show up right in your podcast...

·

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers

Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the likelihood of a gold answer, but they require answer supervision or stable task-specific verifiers. Conversely, label-free RL methods extract self-signals from output distributions, but mainly at the answer or trajectory level and therefore cannot assign credit to intermediate turns. We propose Self-Induced Outcome Po...

·

Agents for financial services

Anthropic releases ten Cowork and Claude Code plugins plus Microsoft 365 integrations and MCP app for financial services.

·

How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car

The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and... The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and acting. In most vehicles on the road today, in-vehicle assistants still rely on fixed command-response patterns: interpret a phrase, trigger an action, reset. While effective for well-defined tasks, this approach doesn’t scale to modern… Source

·

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design

Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don't... Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don’t follow a pre-determined sequence of actions. They call tools, spawn sub-agents with different tasks and models, retain information in memory, manage their own context window, and decide for themselves when they’re finished. In doing so… Source

·

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl

NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and... NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and matrix multiply-accumulate—rather than manually coordinating threads, warps, and shared memory. cuTile.jl brings the same tile-based approach to the dynamic programming language Julia. Users can write custom GPU kernels without dropping… Source

·

[Open Source] We built a local code search MCP for Claude Code that uses ~98% fewer tokens than grep+read

Working on large codebases with Claude Code, we kept running into the same issue: when Claude looks for relevant code, it falls back to grep, reading full files, or launching multiple subagents. This burns through tokens, and often misses the relevant code. There are some existing solutions (that we also benchmarked against), but they all had issues (too slow, needs API keys, quality not good enough, etc). We built [Semble](https://github.com/MinishLab/semble) to fix this. It's a local MCP server that gives Claude Code high quality code search: instead of reading files to find what's relevan...

··

How to be better than 99% of Claude Code users while doing less, imo:

tl;dr: your skill in AI is a measure of your **quality** and **scale**. Use **success criteria** and **subagents** intentionally to get excellent results. Use skills and .md docs when you find repeating patterns in your daily work, not before. **---** **Quality** comes from telling the agent what outcome you want, and the **success criteria** that you will use to measure a “good” outcome. This helps avoid Claude's tendency to rush completion. Note this is specifically *not* telling it what to *do*, but instead what to *achieve*. If you come from the old world, you might remember terms like ...

··

Absolutely blown away by the utility of the Claude Word add-in

I can have multiple, dense legal documents on my screen, each 40, 60, or 100+ pages each with the Claude Word add-in agents syncing, pushing and pulling information between them, pinging each other, and providing helpful context so that I can draft all three or four in parallel or ensure that an entire package is consistent. I can have a lengthy spreadsheet workbook open containing 10 worksheets and the information is analyzed and pulled in by the agents when needed. I am absolutely blown away at how well this is implemented and the improvement in quality, consistency and efficiency. It ...

··

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but orchestration: selecting, for each operational event, the relevant data (metrics, logs, change events) and the applicable operational knowledge (handbook rules and practitioner experience). Feeding all signals indiscriminately causes dilution and hallucination, while manually cura...

·

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user requests. Existing mitigation methods, such as Reinforcement Learning from Human Feedback (RLHF) and constitutional prompting, operate primarily at the model level and provide only probabilistic safety guarantees. We propose the Policy-Execution-Authorization (PEA) architecture, a "separation-of-powers" design that enforces safety at the system level. PEA decouples intent generation, authorization, an...

·

Claude Code Manager

[http://claude.ldlework.com](http://claude.ldlework.com/) I built this for myself but I figured why not share. I'm happy to receive feedback, I know it's not perfect. Thanks for taking a look. The aim of CCM is to be able to fully manage all Claude Code configuration files, both globally and those in your project. Some neat features: \- Manages your [CLAUDE.md](http://claude.md/), rules, hooks, agents, memories and so on. \- Elevate memories to rules \- Copy/Move any asset from one scope to another, or elevate it to global scope \- Install marketplaces and plugins The full app is embe...

··

Is the ds/ml slowly being morphed into an AI engineer? [D]

Agents are amazing. Harnesses are cool. But the fundamental role of a data scientist is not to use a generalist model in an existing workflow; it's a completely different field. AI engineering is the body of the vehicle, whereas the actual brain/engine behind it is the data scientist's playground. I feel like I am not alone in this realisation that my role somehow got silently morphed into that of an AI engineer, with the engine's development becoming a complete afterthought. Based on industry requirements and ongoing research, most of the work has quietly shifted from building the engine t...

··

China’s DeepSeek previews new AI model a year after jolting US rivals

Chinese AI company DeepSeek released a preview of its hotly anticipated next-generation AI model V4 on Friday, saying that the open-source model can compete with leading closed-source systems from US rivals including Anthropic, Google, and OpenAI. DeepSeek says V4 marks a major improvement over prior models, especially in coding, a capability that has become central to AI agents and helped drive the success of tools like ChatGPT Codex and Claude Code. The release is also a milestone for China's chip industry, with DeepSeek explicitly highlighting compatibility with domestic Huawei technology....

·

Winning a Kaggle Competition with Generative AI–Assisted Coding

In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground... In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground competition. Success in modern machine learning competitions is increasingly defined by how quickly you can generate, test, and iterate on ideas. LLM agents, combined with GPU acceleration, dramatically compress this loop. Historically… Source

·

You’re about to feel the AI money squeeze

Earlier this month, millions of OpenClaw users woke up to a sweeping mandate: The viral AI agent tool, which this year took the worldwide tech industry by storm, had been severely restricted by Anthropic. Anthropic, like other leading AI labs, was under immense pressure to lessen the strain on its systems and start turning a profit. So if the users wanted its Claude AI to power their popular agents, they'd have to start paying handsomely for the privilege. "Our subscriptions weren't built for the usage patterns of these third-party tools," wrote Boris Cherny, head of Claude Code, on X. "We wa...

·

one week in: opus 4.7 vs 4.6 - worse one shot rate, double the retries

I spent some time few days back comparing Opus 4.6 and 4.7 using my own usage data - just to see how they actually behave side by side. [https://github.com/getagentseal/codeburn](https://github.com/getagentseal/codeburn) it’s still pretty early for 4.7, but a few things surprised me. In my sessions, 4.7 gets things right on the first try less often than 4.6. One-shot rate sits around 74.5% vs 83.8%, and I’m seeing roughly double the retries per edit (0.46 vs 0.22). It also produces a lot more output per call - about 800 tokens vs 372 on 4.6 - which makes it noticeably more expensive. ...

··

OpenAI now lets teams make custom bots that can do work on their own

OpenAI is giving users of its Business, Enterprise, Edu, and Teachers plans access to cloud-based "workspace" agents available in ChatGPT that can perform business tasks. In its blog post, OpenAI gives examples of agents like one that finds product feedback on the web and sends a report in Slack and a sales agent that can draft follow-up emails in Gmail. These new agents follow increasing interest in agents across the AI landscape, especially after OpenClaw - the AI agent formerly known as Clawdbot and Moltbot that touts itself as the "AI that actually does things" - went viral. OpenClaw foun...

·

Now Meta will track what employees do on their computers to train its AI agents

Meta employees' activity at work is now being used to train the company's AI agents. As reported by Reuters, Meta is installing a tool it calls Model Capability Initiative (MCI) on US-based employees' computers that runs in work-related apps and websites, recording mouse movements, clicks, keystrokes, and occasional screenshots. The data from this tool will be used to train the company's AI models to get better at interacting with computers the way humans do, including automating work tasks like those Meta's employees perform on the job. According to Reuters, the data from MCI won't be "used ...

·
100 stories