The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

AEL: Agent Evolving Learning for Open-Ended Environments

Agent Evolving Learning framework enables LLM agents to accumulate and leverage experience across open-ended episodes via two-timescale Thompson sampling.

Wujiang Xu·14 days ago

The Verge AI· PRESS

You’re about to feel the AI money squeeze

Earlier this month, millions of OpenClaw users woke up to a sweeping mandate: The viral AI agent tool, which this year took the worldwide tech industry by storm, had been severely restricted by Anthropic. Anthropic, like other leading AI labs, was under immense pressure to lessen the strain on its systems and start turning a profit. So if the users wanted its Claude AI to power their popular agents, they'd have to start paying handsomely for the privilege. "Our subscriptions weren't built for the usage patterns of these third-party tools," wrote Boris Cherny, head of Claude Code, on X. "We wa...

Hayden Field·14 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Promoting Simple Agents: Ensemble Methods for Event-Log Prediction

N-gram models match LSTM/Transformer accuracy on event-log prediction with lower resources and better stability than neural baselines.

Benedikt Bollig·14 days ago

r/Anthropic· COMMUNITY

one week in: opus 4.7 vs 4.6 - worse one shot rate, double the retries

I spent some time few days back comparing Opus 4.6 and 4.7 using my own usage data - just to see how they actually behave side by side. [https://github.com/getagentseal/codeburn](https://github.com/getagentseal/codeburn) it’s still pretty early for 4.7, but a few things surprised me. In my sessions, 4.7 gets things right on the first try less often than 4.6. One-shot rate sits around 74.5% vs 83.8%, and I’m seeing roughly double the retries per edit (0.46 vs 0.22). It also produces a lot more output per call - about 800 tokens vs 372 on 4.6 - which makes it noticeably more expensive. ...

u/MurkyFlan567·14 days ago·50 pts / 3 comm

r/LocalLLaMA· COMMUNITY

Open-source dashboard to visualize AI coding agents (Claude Code)

Open-source Agent-Quest tool visualizes Claude Code agents in real-time across parallel CLI sessions using fantasy-themed 2D UI.

u/Redrock990·14 days ago·72 pts / 21 comm

The Verge AI· PRESS

OpenAI now lets teams make custom bots that can do work on their own

OpenAI is giving users of its Business, Enterprise, Edu, and Teachers plans access to cloud-based "workspace" agents available in ChatGPT that can perform business tasks. In its blog post, OpenAI gives examples of agents like one that finds product feedback on the web and sends a report in Slack and a sales agent that can draft follow-up emails in Gmail. These new agents follow increasing interest in agents across the AI landscape, especially after OpenClaw - the AI agent formerly known as Clawdbot and Moltbot that touts itself as the "AI that actually does things" - went viral. OpenClaw foun...

Jay Peters·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Interval POMDP Shielding for Imperfect-Perception Agents

Interval POMDP shielding method for autonomous systems with learned perception, using confidence intervals to block unsafe actions under sensor uncertainty.

William Scarbro·15 days ago

The Verge AI· PRESS

Now Meta will track what employees do on their computers to train its AI agents

Meta employees' activity at work is now being used to train the company's AI agents. As reported by Reuters, Meta is installing a tool it calls Model Capability Initiative (MCI) on US-based employees' computers that runs in work-related apps and websites, recording mouse movements, clicks, keystrokes, and occasional screenshots. The data from this tool will be used to train the company's AI models to get better at interacting with computers the way humans do, including automating work tasks like those Meta's employees perform on the job. According to Reuters, the data from MCI won't be "used ...

Stevie Bonifield·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Trust, Lies, and Long Memories: Emergent Social Dynamics and Reputation in Multi-Round Avalon with LLM Agents

LLM agents in repeated Avalon deception games develop reputation dynamics and social memory across 188 games, studying emergent multi-round behavior.

Suveen Ellawela·15 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents

ProactAgent enables lifelong learning agents to proactively retrieve past experience and skills during task interaction rather than passively.

Yuxuan Cai·15 days ago

MIT Tech Review· PRESS

AI needs a strong data fabric to deliver business value

Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey. But as AI becomes…

MIT Technology Review Insights·15 days ago

OpenAI· FRONTIER

Workspace agents

OpenAI releases workspace agents for ChatGPT to automate workflows and integrate enterprise tools with cloud-based execution.

OpenAI·15 days ago·+ covered by others

MIT Tech Review· PRESS

Agent orchestration

When people say AI will speed up drug development or fear that it will bring about mass layoffs, what they have in mind—whether they know it or not—are AI agents. ChatGPT made large language models a mass consumer product. But to change the world, AI needs to do more than just talk back: It needs…

Will Douglas Heaven·16 days ago

Ars Technica AI· PRESS

Report: Meta will train AI agents by tracking employees' mouse, keyboard use

Move highlights the difficulty of finding high-quality interactive training data.

Kyle Orland ·16 days ago

TechCrunch AI· PRESS

AI research lab NeoCognition lands $40M seed to build agents that learn like humans

Founded by an OSU researcher, the startup is developing AI agents that can become experts in any domain.

Marina Temkin·16 days ago

MIT Tech Review· PRESS

Building agent-first governance and security

As AI agents increasingly work alongside humans across organizations, companies could be inadvertently opening a new attack surface. Insecure agents can be manipulated to access sensitive systems and proprietary data, increasing enterprise risk. In some modern enterprises, non-human identities (NHI) are outpacing human identities, and that trend will explode with agentic AI. Solid governance and…

MIT Technology Review Insights·16 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

An AI Agent Execution Environment to Safeguard User Data

GAAP execution environment guarantees privacy for AI agents by enforcing data access controls against prompt injection and untrusted providers.

Robert Stanley·16 days ago

Simon Willison· ANALYST

Quoting Andreas Påhlsson-Notini

Andreas Påhlsson-Notini argues current AI agents inherit human flaws—lack of rigor, patience, focus—rather than transcending them.

Simon Willison·16 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

Framework for long-horizon terminal agents using observational context compression; reduces quadratic token cost via self-evolving compression.

Jincheng Ren·16 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Multi-agent LLM framework addresses actor-observer cognitive bias via dialectical alignment in self-reflection loops.

Bobo Li·16 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

Four-axis alignment framework (factual, reasoning, compliance, regulatory) for evaluating long-horizon enterprise AI agents in loan/claims/clinical domains.

Vasundra Srininvasan·16 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

DeepRed benchmark evaluates LLM agents on realistic Capture The Flag challenges with partial-credit scoring beyond binary outcomes.

Ali Al-Kaswan·16 days ago

NVIDIA Dev Blog· INFRA

Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson

The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these... The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these models at the edge, enabling physical AI agents and autonomous robots to automate heavy-duty tasks. A key challenge is efficiently running multi-billion-parameter models on edge devices with limited memory. With ongoing constraints on… Source

Anshuman Bhat·17 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

ClawEnvKit generates diverse robotic manipulation environments from natural language via automated parsing, generation, and verification pipeline.

Xirui Li·17 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation

MASS-RAG: multi-agent synthesis for RAG with role-specialized agents for summarization, extraction, and reasoning over noisy/incomplete contexts.

Xingchen Xiao·17 days ago

NVIDIA Dev Blog· INFRA

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating repetitive tasks, executing tasks, writing documentation, and more. OpenAI Codex, for example, is a coding agent designed to assist developers through tasks like code generation, debugging, and automated pull request (PR) creation. Source

Daniel Teixeira·17 days ago

MIT Tech Review· PRESS

Chinese tech workers are starting to train their AI doubles–and pushing back

Tech workers in China are being instructed by their bosses to train AI agents to replace them—and it’s prompting a wave of soul-searching among otherwise enthusiastic early adopters. Earlier this month a GitHub project called Colleague Skill, which claimed workers could use it to “distill” their colleagues’ skills and personality traits and replicate them with…

Caiwei Chen·17 days ago

Simon Willison· ANALYST

Headless everything for personal AI

Analysis of headless API-first architecture replacing GUI interaction for personal AI agents, citing Salesforce Agentforce.

Simon Willison·18 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Multi-Agent Approach for Claim Verification from Tabular Data Documents

MACE multi-agent framework verifies claims from tabular data using specialized Planner, Executor, Verifier agents with zero-shot CoT.

Rudra Ranajee Saha·19 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks

Benchmark of 10 frontier LLM agents on NYU CTF Bench offensive cybersecurity tasks across 7 providers, 200 challenges.

Tyler H. Merves·19 days ago

← Front Page30 matches

← Newer Older →