The Archive
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
SafetyKit scales risk agents with OpenAI’s most capable models
SafetyKit product leverages GPT-5 for content moderation and compliance enforcement with improved accuracy over legacy systems.
Scaling accounting capacity with OpenAI
Basis built AI agents using o3, o3-Pro, GPT-4.1, and GPT-5 delivering 30% time savings for accounting firms.
Our framework for developing safe and trustworthy agents
Anthropic publishes framework for developing safe and trustworthy autonomous agents with specified governance principles.
Resolving digital threats 100x faster with OpenAI
Outtake uses GPT-4.1 and OpenAI o3 agents to detect security threats 100x faster.
Model ML is helping financial firms rebuild with AI from the ground up
Model ML CEO discusses AI-native infrastructure and autonomous agents for financial services transformation.
No-code personal agents, powered by GPT-4.1 and Realtime API
Genspark built $36M ARR no-code agent product in 45 days using GPT-4.1 and OpenAI Realtime API.
Devstral
Devstral: Mistral AI open-source model optimized for autonomous coding agents and software development.
BrowseComp: a benchmark for browsing agents
OpenAI introduces BrowseComp benchmark for evaluating web browsing agent capabilities.
PaperBench: Evaluating AI’s Ability to Replicate AI Research
PaperBench: new benchmark measuring AI agents' ability to replicate state-of-the-art research papers.
Moving from intent-based bots to proactive AI agents
OpenAI shifts from intent-based bots to proactive AI agents architecture.
Automating 90% of finance and legal work with agents
Hebbia's AI platform claims to automate 90% of finance and legal work tasks using OpenAI models.
Introducing next-generation audio models in the API
OpenAI released advanced text-to-speech and speech-to-text APIs with customizable voice instructions for voice agents.
Grok 3 Beta — The Age of Reasoning Agents
xAI unveils early preview of Grok 3, emphasizing advanced reasoning and agentic capabilities.
Google DeepMind at NeurIPS 2024
Google DeepMind presents NeurIPS 2024 research spanning adaptive agents, 3D scene generation, and LLM training safety.
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
MLE-bench introduces benchmark for evaluating AI agents on machine learning engineering tasks.
Automating customer support agents
MavenAGI launches GPT-4-powered customer service agent; Tripadvisor, Clickup, Rho deploy for support automation.
Klarna's AI assistant does the work of 700 full-time agents
Klarna is using AI to revolutionize personal shopping, customer service, and employee productivity.