The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticalit...

Haipeng Luo·5 days ago

The Verge AI· PRESS

Can anyone look cool wearing Snap’s $2,000 glasses?

Snap CEO Evan Spiegel wearing the Snap Specs. They’re not the worst on him, but bold fashion rarely makes for mainstream success. | Screenshot: CNBC Yesterday, Snap debuted its new $2,195 Specs glasses. In an interview with CNBC, Snap CEO Evan Spiegel described the Specs as something the company had been working on for more than 12 years, an attempt to "bring computing into the world" and "make it more human." He positioned them as a device to help people stay more connected to the world around them instead of looking down at their phones. People, he said, are tired of screens. While Spiegel ...

Victoria Song·5 days ago

The Archive

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Can anyone look cool wearing Snap’s $2,000 glasses?

A Human-in-the-Loop Bayesian Optimization Framework for Constraint-Aware Bioprocess Development

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

Machine Unlearning for the XGBoost Model with Network Intrusion Datasets

The Gemini-powered Google Home Speaker arrives on June 25 for $100

RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question Answering

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

The More the Merrier: Combining Properties for ABox Abduction under Repair Semantics for ELbot

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

MolmoMotion: Language-guided 3D motion forecasting

The slowtech revolution is here to kill your phone addiction and rescue your attention span

AGDN: Learning to Solve Traveling Salesman Problem with Anisotropic Graph Diffusion Network

When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift

Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

Hardware- and Vision-in-the-Loop Validation of Deep Monocular Pose Estimation for Autonomous Maritime UAV Flight

A Clinician-Centered Pipeline for Annotation and Evaluation in Ultrasound AI Studies

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

Essential Subspace Merging for Multi-Task Learning

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

New research shows how AMIE, our medical AI, could help manage health conditions.

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it.

IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages

AdsMind: A Physics-Grounded Multi-Agent System for Self-Correcting Discovery of Adsorption Configurations on Heterogeneous Catalyst Surfaces

Complementary Attention Head Pruning for Efficient Transformers

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing