The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Quoting Andreas Påhlsson-Notini

Andreas Påhlsson-Notini argues current AI agents inherit human flaws—lack of rigor, patience, focus—rather than transcending them.

Simon Willison·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation

CoCo-SAM3 resolves mask overlap and semantic drift in open-vocabulary semantic segmentation by modeling concept conflicts across prompts.

Yanhui Chen·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The signal is the ceiling: Measurement limits of LLM-predicted experience ratings from open-ended survey text

Empirical study on prompt design vs. model selection for predicting fan experience ratings from survey text using GPT variants.

Andrew Hong·2 months ago

r/singularity· COMMUNITY

Introducing Deep Research and Deep Research Max

Google announces Deep Research and Deep Research Max features for enhanced information synthesis.

u/ShreckAndDonkey123·2 months ago·146 pts / 25 comm

r/LocalLLaMA· COMMUNITY

235M param LLM from scratch on a single RTX 5080

Plasma 1.0: 235M-param LLaMA-style model trained from scratch on single RTX 5080 GPU.

u/ExcellentTip9926·2 months ago·63 pts / 10 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Micro Language Models Enable Instant Responses

Introduces micro language models (8M-30M params) that generate first tokens on-device while cloud model completes response, masking latency.

Wen Cheng·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Safety-Critical Contextual Control via Online Riemannian Optimization with World Models

Proposes Penalized Predictive Control framework using online Riemannian optimization for safety-critical control with black-box world models.

Tongxin Li·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

SafetyALFRED benchmark evaluates multimodal LLMs (Qwen, Gemma, Gemini) on hazard recognition and mitigation in embodied kitchen environments.

Josue Torres-Fonseca·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model

Introduces Chunk-wise Interleaved Splicing paradigm enabling autoregressive models for real-time streaming target speaker extraction.

Shuhai Peng·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Time Series Augmented Generation for Financial Applications

Proposes TSAG framework and benchmark for evaluating LLM reasoning on financial time-series analysis with task delegation to computation modules.

Anton Kolonin·2 months ago

The Verge AI· PRESS

Ordering with the Starbucks ChatGPT app was a true coffee nightmare

You just cannot convince me this is how people order coffee. | Image: Starbucks Venti iced coffee, light skim milk. That's what I get at Starbucks. It is what I have gotten at Starbucks every time I've been to Starbucks for as long as I can remember, other than a brief love affair with the caffe misto a few years ago. In person, my brain barely needs to activate to say the words aloud; in the app, it's four taps and I'm ready to go. My first time ordering Starbucks through its new ChatGPT integration, which launched last week, was comparatively a complete mess. Getting started is easy enough,...

David Pierce·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets

SAGE method for edge-cloud inference selects semantic content beyond attention importance under hard uplink bit-budget constraints.

Inhyeok Choi·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The "Small World of Words" German Free-Association Norms

Release of German free-association norms dataset for 5,877 cue words as part of multilingual Small World of Words project.

Samuel Aeschbach·2 months ago

r/LocalLLaMA· COMMUNITY

Differences Between Kimi K2.5 and Kimi K2.6 on MineBench

Benchmark comparison of Kimi K2.5 vs K2.6 on MineBench; K2.6 shows inconsistent but high-ceiling performance at $2.35 cost.

u/ENT_Alam·2 months ago·176 pts / 22 comm

r/singularity· COMMUNITY

Differences Between Kimi K2.5 and Kimi K2.6 on MineBench

Kimi K2.6 vs K2.5 benchmark on MineBench shows cost-effective performance gains at $2.35 total.

u/ENT_Alam·2 months ago·130 pts / 25 comm

TechCrunch AI· PRESS

AI Dungeon maker Latitude unveils Voyage, a platform for creating AI-powered RPGs

Latitude's new AI-native platform, Voyage, aims to help gamers create their very own role-playing game.

Lauren Forristal·2 months ago

r/OpenAI· COMMUNITY

New image gen 2 is incredible at coloring

User reports Image Gen 2 demonstrates advanced reasoning in color composition, generating cinematic palettes and tonal coherence across panels.

u/Gold_Palpitation8982·2 months ago·180 pts / 20 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

AblateCell agent reproduces baselines and systematically ablates components in AI virtual cell repositories for biological research.

Xue Xia·2 months ago

Simon Willison· ANALYST

scosman/pelicans_riding_bicycles

Simon Willison comments on synthetic data injection into training sets via absurdist example (pelicans on bicycles).

Simon Willison·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models

Comparative study of output consistency across GPT-4.1, Claude Sonnet 4.6, Gemini 2.5 Flash for exercise prescriptions with safety analysis.

Kihyuk Lee·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian

Romanian legal domain GEC dataset addresses grammatical error correction in low-resource language niche; narrow domain applicability.

Mircea Timpuriu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

An Efficient Black-Box Reduction from Online Learning to Multicalibration, and a New Route to $Φ$-Regret Minimization

Theoretical reduction from online learning to multicalibration via EVI solvers; advances in regret minimization analysis.

Gabriele Farina·2 months ago

r/LocalLLaMA· COMMUNITY

Gemma 4 Vision

User documents that Gemma 4's default vision budget (280 tokens) is too low; Variable Image Resolution mode requires manual tuning.

u/seamonn·2 months ago·199 pts / 48 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

A Bolu: A Structured Dataset for the Computational Analysis of Sardinian Improvisational Poetry

Corpus of Sardinian extemporaneous poetry for NLP; minority language resource with limited frontier AI relevance.

Silvio Calderaro·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

Empirical study: LLMs measurably alter peer review opinions on clarity, originality at top AI conferences.

Wenqing Wu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

Framework for long-horizon terminal agents using observational context compression; reduces quadratic token cost via self-evolving compression.

Jincheng Ren·2 months ago

TechCrunch AI· PRESS

Bond, a new social media platform, wants to use AI to help you kick your doomscrolling habit

Bond wants you to get off the couch and get back into the real world, its creator says. The new platform's AI system is designed to motivate users to do things away from the app.

Lucas Ropek·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Lyapunov-Certified Direct Switching Theory for Q-Learning

Lyapunov stability analysis of constant-stepsize Q-learning via switched systems; theoretical convergence bounds.

Donghwan Lee·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic

RL post-training for visual semantic arithmetic in multimodal LLMs; extends relational reasoning from text to images.

Chuou Xu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Diagnosable ColBERT: Debugging Late-Interaction Retrieval Models Using a Learned Latent Space as Reference

Interpretability method for ColBERT retrieval via learned latent space reference; targets biomedical/clinical ranking diagnostics.

François Remy·2 months ago

← Front Page30 stories

← Newer Older →