The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents

CORAL framework integrates neurocognitive governance principles into autonomous AI agents for safety-critical deployment with internalized behavioral alignment.

Eranga Bandara·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Modeling Human-Like Color Naming Behavior in Context

NeLLCom-Lex framework models human color naming lexicons in neural agents; extends with context modeling to reduce non-convex divergence from human categories.

Yuqing Zhang·9 days ago

TechCrunch AI· PRESS

Red Hat’s OpenClaw maintainer just made enterprise Claw deployments a lot safer

Tank OS puts OpenClaw AI agents into a container that let's it run reliably and more safely, especially for those running fleets of them.

Julie Bort·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

SnapGuard detects prompt injection attacks on screenshot-based web agents using lightweight multimodal methods instead of large VLMs.

Mengyao Du·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems

Semantic Gateway framework applies formal validation and zero-trust security to LLM-orchestrated enterprise APIs using Model Context Protocol.

Ignacio Peyrano·9 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences

Automated adversarial collaboration framework using LLM agents and program synthesis to adjudicate competing cognitive science theories.

Suyog Chandramouli·9 days ago

OpenAI· FRONTIER

OpenAI models, Codex, and Managed Agents come to AWS

OpenAI GPT models, Codex, and Managed Agents now available on AWS for enterprise deployment.

OpenAI·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

Persona Collapse in multi-agent LLM simulations: agents converge to homogeneous behavior despite distinct profiles; framework measures Coverage, Uniformity, Complexity.

Yunze Xiao·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

SciCrafter: Minecraft benchmark evaluating agents' discovery-to-application loop via parameterized redstone circuit tasks.

Zhou Ziheng·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

Informational Viability Principle for autonomous AI agent governance: runtime monitoring and restriction via unobserved risk bounds without code changes.

German Marin·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents

AgentWard: defense-in-depth lifecycle security architecture for autonomous AI agents spanning initialization through execution.

Yixiang Zhang·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Skill Retrieval Augmentation for Agentic AI

Skill Retrieval Augmentation enables LLM agents to retrieve relevant skills from large corpora without explicit enumeration.

Weihang Su·10 days ago

r/MachineLearning· COMMUNITY

How do you test AI agents in production? The unpredictability is overwhelming.[D]

QA engineer discusses challenges testing non-deterministic LLM agents in production, seeking rigorous evaluation methods beyond traditional assertion-based testing.

u/this_aint_taliya·10 days ago·32 pts / 22 comm

TechCrunch AI· PRESS

China vetoes Meta’s $2B Manus deal after months-long probe

China has ordered Meta to unwind its multibillion-dollar Manus acquisition, dealing a potential setback to Zuckerberg’s push into AI agents.

Kate Park·10 days ago

TechCrunch AI· PRESS

OpenAI could be making a phone with AI agents replacing apps

The phone could go in mass production in 2028, an analyst says.

Ivan Mehta·10 days ago

Google AI (Gemma)· FRONTIER

Join the new AI Agents Vibe Coding Course from Google and Kaggle

Google and Kaggle launch 5-day AI Agents Intensive Course; registration open.

{"$":{"xmlns:author":"http://www.w3.org/2005/Atom"},"name":["Anant Nawalgaria"],"title":["Group Product Manager, Founder of GenAI Intensive"],"department":[""],"company":[""]}·10 days ago

OpenAI· FRONTIER

Choco automates food distribution with AI agents

Choco uses OpenAI APIs to automate food distribution logistics via AI agents, improving productivity.

OpenAI·11 days ago

r/LocalLLaMA· COMMUNITY

What is the best coding agent (CLI) like Claude Code for Local Development

Reddit user seeks advice on setting up local coding agents like Claude Code with open-weight models via llama.cpp.

u/exaknight21·11 days ago·43 pts / 82 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user requests. Existing mitigation methods, such as Reinforcement Learning from Human Feedback (RLHF) and constitutional prompting, operate primarily at the model level and provide only probabilistic safety guarantees. We propose the Policy-Execution-Authorization (PEA) architecture, a "separation-of-powers" design that enforces safety at the system level. PEA decouples intent generation, authorization, an...

Rong Xiang·11 days ago

TechCrunch AI· PRESS

Anthropic created a test marketplace for agent-on-agent commerce

In a recent experiment, Anthropic created a classified marketplace where AI agents represented both buyers and sellers, striking real deals for real goods and real money.

Anthony Ha·12 days ago

r/ClaudeAI· COMMUNITY

Claude Code Manager

[http://claude.ldlework.com](http://claude.ldlework.com/) I built this for myself but I figured why not share. I'm happy to receive feedback, I know it's not perfect. Thanks for taking a look. The aim of CCM is to be able to fully manage all Claude Code configuration files, both globally and those in your project. Some neat features: \- Manages your [CLAUDE.md](http://claude.md/), rules, hooks, agents, memories and so on. \- Elevate memories to rules \- Copy/Move any asset from one scope to another, or elevate it to global scope \- Install marketplaces and plugins The full app is embe...

u/ldlework·13 days ago·23 pts / 8 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

Systematic analysis of token consumption patterns in agentic coding tasks across eight frontier LLMs on SWE-bench Verified.

Longju Bai·13 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Taxonomy of world modeling capabilities for AI agents across three levels (predictor, simulator, reasoner) organized by environmental laws.

Meng Chu·13 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning

SOLAR-RL bridges offline and online RL for training MLLM GUI agents on dynamic tasks, combining trajectory semantics with long-horizon learning.

Jichao Wang·13 days ago

r/MachineLearning· COMMUNITY

Is the ds/ml slowly being morphed into an AI engineer? [D]

Agents are amazing. Harnesses are cool. But the fundamental role of a data scientist is not to use a generalist model in an existing workflow; it's a completely different field. AI engineering is the body of the vehicle, whereas the actual brain/engine behind it is the data scientist's playground. I feel like I am not alone in this realisation that my role somehow got silently morphed into that of an AI engineer, with the engine's development becoming a complete afterthought. Based on industry requirements and ongoing research, most of the work has quietly shifted from building the engine t...

u/The-Silvervein·13 days ago·32 pts / 8 comm

The Verge AI· PRESS

China’s DeepSeek previews new AI model a year after jolting US rivals

Chinese AI company DeepSeek released a preview of its hotly anticipated next-generation AI model V4 on Friday, saying that the open-source model can compete with leading closed-source systems from US rivals including Anthropic, Google, and OpenAI. DeepSeek says V4 marks a major improvement over prior models, especially in coding, a capability that has become central to AI agents and helped drive the success of tools like ChatGPT Codex and Claude Code. The release is also a milestone for China's chip industry, with DeepSeek explicitly highlighting compatibility with domestic Huawei technology....

Robert Hart·13 days ago

Hugging Face· INFRA

DeepSeek-V4: a million-token context that agents can actually use

Hugging Face·14 days ago

NVIDIA Dev Blog· INFRA

Winning a Kaggle Competition with Generative AI–Assisted Coding

In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground... In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground competition. Success in modern machine learning competitions is increasingly defined by how quickly you can generate, test, and iterate on ideas. LLM agents, combined with GPU acceleration, dramatically compress this loop. Historically… Source

Chris Deotte·14 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Nemobot is an interactive environment for creating and deploying LLM-powered game agents across multiple game classes using Claude Shannon's taxonomy.

Chee Wei Tan·14 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

StructMem: Structured Memory for Long-Horizon Behavior in LLMs

StructMem proposes hierarchical memory framework for LLM agents balancing relational structure preservation with efficiency for long-horizon reasoning.

Buqiang Xu·14 days ago

← Front Page30 matches

← Newer Older →