The Archive
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Berkeley BAIR proposes StruQ and SecAlign to defend LLM apps against prompt injection via structured queries and preference optimization.
BrowseComp: a benchmark for browsing agents
OpenAI introduces BrowseComp benchmark for evaluating web browsing agent capabilities.
Evaluating RAG with LLM as a Judge
Using Mistral Models for LLM as a Judge (With Structured Outputs)
Anthropic Education Report: How university students use Claude
Anthropic publishes findings on how university students use Claude for learning and coursework.
Anthropic appoints Guillaume Princen as Head of EMEA and announces 100+ new roles across the region
Anthropic hires Guillaume Princen as EMEA head and announces 100+ regional positions.
Repurposing Protein Folding Models for Generation with Latent Diffusion
Berkeley BAIR's PLAID model generates proteins (sequence + 3D structure) via latent diffusion in AlphaFold2 latent space, enabling billion-scale training.
Canva enables creativity with AI
Canva CPO Cameron Adams discusses AI-powered creative tools in conversation with OpenAI.
OpenAI’s EU Economic Blueprint
OpenAI proposes EU Economic Blueprint for AI development, growth, and European sovereignty.
Introducing Anthropic's first developer conference: Code with Claude
Anthropic launches Code with Claude, its first developer-focused conference.
Taking a responsible path to AGI
Google DeepMind outlines AGI safety strategy prioritizing technical risk assessment, proactive evaluation, and industry collaboration.
Evaluating potential cybersecurity threats of advanced AI
Google DeepMind presents framework for evaluating advanced AI cybersecurity risks, enabling prioritized defense identification.
Introducing Claude for Education
Anthropic introduces Claude for Education, targeting student and educator access.
New commission to provide insight as OpenAI builds the world’s best-equipped nonprofit
OpenAI establishes commission to scale nonprofit operations with AI technology and financial resources.
PaperBench: Evaluating AI’s Ability to Replicate AI Research
PaperBench: new benchmark measuring AI agents' ability to replicate state-of-the-art research papers.
Our response to the UK’s copyright consultation
OpenAI recommends pro-innovation copyright policies for UK AI leadership in Europe.
New funding to build towards AGI
OpenAI raises $40B at $300B valuation to scale compute, research, and AGI development.
Anthropic Economic Index: Insights from Claude 3.7 Sonnet
Claude 3.7 Sonnet usage surges in coding, education, science; extended thinking dominates technical tasks across 630 usage categories.
Tracing the thoughts of a large language model
Circuit tracing reveals Claude's shared conceptual space for reasoning across languages before translation to output.