The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Predicting Future Behaviors in Reasoning Models Enables Better Steering

Deployed large reasoning models (LRMs) often behave unexpectedly. Test-time steering controls LRM outputs by intervening on their hidden representations, but it can degrade output quality. We argue that prior steering work implicitly relies on internal features that detect behavior in already generated text. We show that these detection features are poor predictors of future behavioral outcomes, and thus not the natural intervention target. Instead, we train activation probes to predict future behavior likelihoods from intermediate reasoning steps. These probes predict the most likely behavio...

Evgenii Kortukov·14 days ago

The Archive

Predicting Future Behaviors in Reasoning Models Enables Better Steering

Algorithmic and Minimax Complexities in Kernel Bandits

Piper: A Programmable Distributed Training System

Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models

Flaws in the LLM Automation Narrative

ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models

COGENT: Continuous Graph Emulators with Neural Ordinary Differential Equations for Long-Term Physical Forecasting

Itô maps for any-step SDEs

ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity

Efficiently Learning Drifting Halfspaces with Massart Noise

OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib

Data assimilation for subsurface flow using latent diffusion model parameterization: performance of ensemble-Kalman and Monte Carlo techniques

First-Order Trajectory Matching: Fast Ensemble Predictions of Chaotic, Turbulent, Stochastic Systems

Robust Regression of General ReLUs with Queries

Provenance-Grounded Gating and Adaptive Recovery in Synthetic Post-Training Data Curation

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

Overcoming Rank Collapse in Feedback Alignment

Monte Carlo Pass Search: Using Trajectory Generation for 3D Counterfactual Pass Evaluation in Football

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Towards Autonomous Accelerator Design: FPGA Accelerator Generation with SECDA

Designed by Journalists, but Is It for Readers? Rethinking AI Disclosures and Transparency in News

Multimodal Brain Tumour Classification Using Feature Fusion

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

PhantomBench: Benchmarking the Non-existential Threat of Language Models

Limitations of Learning Tanh Neural Networks with Finite Precision

Claude Fable 5 and Claude Mythos 5

Anthropic’s Claude Fable is a version of Mythos the public can access today

Anthropic releases its first Mythos-class model Claude Fable

Do Transformers Actually Help Intrusion Detection? A Temporal Sequence Evaluation on CIC-IDS2017