The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, many studies have observed that MTP acceptance rates degrade significantly during RL training, leading to limited speedup performance. To address this bottleneck, we present Bebop, a systematic study of MTP in LLM post-training, and offer practical recipes to integrate MTP into large-scale RL pipelines. First, we reve...

Yucheng Li·13 days ago

The Archive

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics

On Subquadratic Architectures: From Applications to Principles

Latent World Recovery for Multimodal Learning with Missing Modalities

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy

Nonslop: A Gamified Experiment in Human-AI Collaborative Writing

Google won’t just admit it’s feeding YouTube creators to its music AI

Nobody needs AI to search the Internet, court says in ruling against Google

Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

ALIGNBEAM : Inference-Time Alignment Transfer via Cross-Vocabulary Logit Mixing

‘AI-pilled’ firms spend $7,500 per employee each month on AI

Adjoint Method versus Physics-Informed Neural Networks in PDE-Constrained Inverse Problems

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

Measuring Semantic Progress in Multi-turn Dialogue via Information Gain

PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents

A Five-Plane Reference Architecture for Runtime Governance of Production AI Agents

Harness In-Context Operator Learning with Chain of Operators

Microsoft restricts Claude Fable for employees over data retention concerns

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition

Learning What to Say to Your VLA: Mostly Harmless Vision Language Action Model Steering

Findings of the MAGMaR 2026 Shared Task

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics

DiffusionGemma: 4x faster text generation

SpikeDecoder: Realizing the GPT Architecture with Spiking Neural Networks

PianoKontext: Expressive Performance Rendering from Deadpan Context

CCKS: Consensus-based Communication and Knowledge Sharing

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs