AEL: Agent Evolving Learning for Open-Ended Environments
Agent Evolving Learning framework enables LLM agents to accumulate and leverage experience across open-ended episodes via two-timescale Thompson sampling.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Agent Evolving Learning framework enables LLM agents to accumulate and leverage experience across open-ended episodes via two-timescale Thompson sampling.
Earlier this month, millions of OpenClaw users woke up to a sweeping mandate: The viral AI agent tool, which this year took the worldwide tech industry by storm, had been severely restricted by Anthropic. Anthropic, like other leading AI labs, was under immense pressure to lessen the strain on its systems and start turning a profit. So if the users wanted its Claude AI to power their popular agents, they'd have to start paying handsomely for the privilege. "Our subscriptions weren't built for the usage patterns of these third-party tools," wrote Boris Cherny, head of Claude Code, on X. "We wa...
N-gram models match LSTM/Transformer accuracy on event-log prediction with lower resources and better stability than neural baselines.
I spent some time few days back comparing Opus 4.6 and 4.7 using my own usage data - just to see how they actually behave side by side. [https://github.com/getagentseal/codeburn](https://github.com/getagentseal/codeburn) it’s still pretty early for 4.7, but a few things surprised me. In my sessions, 4.7 gets things right on the first try less often than 4.6. One-shot rate sits around 74.5% vs 83.8%, and I’m seeing roughly double the retries per edit (0.46 vs 0.22). It also produces a lot more output per call - about 800 tokens vs 372 on 4.6 - which makes it noticeably more expensive. ...
Open-source Agent-Quest tool visualizes Claude Code agents in real-time across parallel CLI sessions using fantasy-themed 2D UI.
OpenAI is giving users of its Business, Enterprise, Edu, and Teachers plans access to cloud-based "workspace" agents available in ChatGPT that can perform business tasks. In its blog post, OpenAI gives examples of agents like one that finds product feedback on the web and sends a report in Slack and a sales agent that can draft follow-up emails in Gmail. These new agents follow increasing interest in agents across the AI landscape, especially after OpenClaw - the AI agent formerly known as Clawdbot and Moltbot that touts itself as the "AI that actually does things" - went viral. OpenClaw foun...
Interval POMDP shielding method for autonomous systems with learned perception, using confidence intervals to block unsafe actions under sensor uncertainty.
Meta employees' activity at work is now being used to train the company's AI agents. As reported by Reuters, Meta is installing a tool it calls Model Capability Initiative (MCI) on US-based employees' computers that runs in work-related apps and websites, recording mouse movements, clicks, keystrokes, and occasional screenshots. The data from this tool will be used to train the company's AI models to get better at interacting with computers the way humans do, including automating work tasks like those Meta's employees perform on the job. According to Reuters, the data from MCI won't be "used ...
LLM agents in repeated Avalon deception games develop reputation dynamics and social memory across 188 games, studying emergent multi-round behavior.
ProactAgent enables lifelong learning agents to proactively retrieve past experience and skills during task interaction rather than passively.
Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey. But as AI becomes…
OpenAI releases workspace agents for ChatGPT to automate workflows and integrate enterprise tools with cloud-based execution.
When people say AI will speed up drug development or fear that it will bring about mass layoffs, what they have in mind—whether they know it or not—are AI agents. ChatGPT made large language models a mass consumer product. But to change the world, AI needs to do more than just talk back: It needs…
Move highlights the difficulty of finding high-quality interactive training data.
Founded by an OSU researcher, the startup is developing AI agents that can become experts in any domain.
As AI agents increasingly work alongside humans across organizations, companies could be inadvertently opening a new attack surface. Insecure agents can be manipulated to access sensitive systems and proprietary data, increasing enterprise risk. In some modern enterprises, non-human identities (NHI) are outpacing human identities, and that trend will explode with agentic AI. Solid governance and…
GAAP execution environment guarantees privacy for AI agents by enforcing data access controls against prompt injection and untrusted providers.
Andreas Påhlsson-Notini argues current AI agents inherit human flaws—lack of rigor, patience, focus—rather than transcending them.
Framework for long-horizon terminal agents using observational context compression; reduces quadratic token cost via self-evolving compression.
Multi-agent LLM framework addresses actor-observer cognitive bias via dialectical alignment in self-reflection loops.
Four-axis alignment framework (factual, reasoning, compliance, regulatory) for evaluating long-horizon enterprise AI agents in loan/claims/clinical domains.
DeepRed benchmark evaluates LLM agents on realistic Capture The Flag challenges with partial-credit scoring beyond binary outcomes.
The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these... The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these models at the edge, enabling physical AI agents and autonomous robots to automate heavy-duty tasks. A key challenge is efficiently running multi-billion-parameter models on edge devices with limited memory. With ongoing constraints on… Source
ClawEnvKit generates diverse robotic manipulation environments from natural language via automated parsing, generation, and verification pipeline.
MASS-RAG: multi-agent synthesis for RAG with role-specialized agents for summarization, extraction, and reasoning over noisy/incomplete contexts.
AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating repetitive tasks, executing tasks, writing documentation, and more. OpenAI Codex, for example, is a coding agent designed to assist developers through tasks like code generation, debugging, and automated pull request (PR) creation. Source
Tech workers in China are being instructed by their bosses to train AI agents to replace them—and it’s prompting a wave of soul-searching among otherwise enthusiastic early adopters. Earlier this month a GitHub project called Colleague Skill, which claimed workers could use it to “distill” their colleagues’ skills and personality traits and replicate them with…
Analysis of headless API-first architecture replacing GUI interaction for personal AI agents, citing Salesforce Agentforce.
MACE multi-agent framework verifies claims from tabular data using specialized Planner, Executor, Verifier agents with zero-shot CoT.
Benchmark of 10 frontier LLM agents on NYU CTF Bench offensive cybersecurity tasks across 7 providers, 200 challenges.