The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

r/Anthropic· COMMUNITY

Was Opus 4.5 really the best as people claim to be?

User asks why Claude Opus 4.5 is perceived as superior compared to 4.7.

u/ApocalypseBS·2 months ago·39 pts / 33 comm

r/LocalLLaMA· COMMUNITY

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

User comparison of Gemma 4 26B and Qwen 3.6 35B performance on consumer GPU with anecdotal quality assessments.

u/LocalAI_Amateur·2 months ago·230 pts / 66 comm

Simon Willison· ANALYST

llm-openrouter 0.6

llm-openrouter 0.6 adds model refresh command to bypass cache expiry and access newly available models like Kimi 2.6.

Simon Willison·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

MathNet: 30.7K multilingual Olympiad problems across 47 countries benchmarking math reasoning in LLMs and retrieval in embeddings.

Shaden Alshammari·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Sessa: Selective State Space Attention

Sessa: selective state-space attention mechanism improving token influence preservation vs. Transformers in long-context sequence modeling.

Liubomyr Horbatko·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Bounded Ratio Reinforcement Learning

BRRL: bounded-ratio RL framework bridging trust-region theory and PPO clipped objectives with monotonic improvement guarantees.

Yunke Ao·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs

BLF: agentic forecaster using Bayesian linguistic belief states and hierarchical aggregation, SOTA on ForecastBench.

Kevin Murphy·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

When Can LLMs Learn to Reason with Weak Supervision?

Empirical study: LLMs generalize under weak supervision for reasoning via RLVR when training reaches reward saturation.

Salman Rahman·2 months ago

Ars Technica AI· PRESS

Robot runner handily beats humans in half-marathon, setting new record

A humanoid robot's record half-marathon run shows China's speed in robotics.

Jeremy Hsu ·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

Platonic Representation Hypothesis tested at scale: cross-modal alignment degrades from 1K to millions of samples, contradicting convergence.

A. Sophia Koepke·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

Apollo: multimodal temporal foundation model trained on 25B clinical records from 7.2M patients across 28 modalities and 12 specialties.

Andrew Zhang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Revisiting Active Sequential Prediction-Powered Mean Estimation

Active sequential prediction-powered mean estimation: exploring mixing parameters for query-probability selection under model supervision.

Maria-Eleni Sfyraki·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

LPSR: inference-time error correction via residual stream monitoring and KV-cache steering, no fine-tuning required.

Manan Gupta·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion

System Dynamics benchmarks (CLD, Discussion Leaderboards): cloud LLMs 77–89% vs. best local 77% on causal diagram extraction.

Terry Leitch·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Dual Alignment Between Language Model Layers and Human Sentence Processing

Study investigates which internal LLM layers best model human cognitive effort in syntactic ambiguity processing, extending prior work on early-layer surprisal.

Tatsuki Kuribayashi·2 months ago

r/OpenAI· COMMUNITY

Kimi K2.6 vs. GPT-5.4 (xhigh) - When will the new OpenAI model be released? This Thursday?

Reddit speculation about OpenAI model release timing with no substantive information or sources.

u/Prestigiouspite·2 months ago·74 pts / 15 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

ConforNets: Latents-Based Conformational Control in OpenFold3

ConforNets method targets latent perturbations in AlphaFold3 to reliably generate alternate protein conformations beyond single dominant structure.

Minji Lee·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

GSQ applies Gumbel-Softmax sampling to scalar quantization, achieving <4bpp accuracy without vector-quantization complexity for LLM deployment.

Alireza Dadgarnia·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work

Technical clarification that TurboQuant's MSE variant is a constrained special case of prior EDEN/DRIVE quantization schemes.

Ran Ben-Basat·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems

Extends physics-informed neural networks (PINNs) to 2D+t reaction-diffusion systems with biological differential operator structure preservation.

William Lavery·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

FUSE: Ensembling Verifiers with Zero Labeled Data

FUSE method improves LLM output verification by ensembling imperfect verifiers without ground-truth labels via conditional-dependency control.

Joonhyuk Lee·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Wasserstein Distributionally Robust Risk-Sensitive Estimation via Conditional Value-at-Risk

Theoretical framework for distributionally robust risk-sensitive signal estimation using Wasserstein balls and conditional value-at-risk.

Feras Al Taha·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

ClawEnvKit generates diverse robotic manipulation environments from natural language via automated parsing, generation, and verification pipeline.

Xirui Li·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Duality for the Adversarial Total Variation

Duality-based characterization of subdifferentials for nonlocal total variation in adversarial binary classification training.

Leon Bungert·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations

KL-regularized dialogue-act prediction incorporates corpus transition statistics, improving German counselling taxonomy classification across datasets.

Eric Rudolph·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Symbolic Synthesis for LTLf+ Obligations

Symbolic synthesis for LTLf+ obligation properties using deterministic weak automata; formal methods for temporal logic verification.

Giuseppe De Giacomo·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

OGER framework integrates offline teacher guidance with online RL for LLM reasoning, improving exploration beyond initial latent space.

Xinyu Ma·2 months ago

r/singularity· COMMUNITY

AGI 🚀

Low-context Reddit post with emoji, likely clickbait.

u/policyweb·2 months ago·6793 pts / 222 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

IDOBE: Infectious Disease Outbreak forecasting Benchmark Ecosystem

IDOBE: benchmark ecosystem for epidemic forecasting with curated epidemiological time series; evaluates statistical and ML ensemble methods.

Aniruddha Adiga·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

LLM Safety From Within: Detecting Harmful Content with Internal Representations

SIREN detects harmful LLM outputs using internal layer representations and linear probing; lightweight guard model without model modification.

Difan Jiao·2 months ago

← Front Page30 stories

← Newer Older →