The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Learning to Hear Hesitation: Continual Learning for Disfluency-Aware ASR

Despite advances in large-scale Automatic Speech Recognition (ASR), disfluent speech remains challenging, as state-of-the-art systems are often optimized to omit disfluencies, leading to information loss and hallucinations. Prior work has focused on verbatim transcription and the integration of disfluency markers, but adapting models on limited datasets can lead to catastrophic forgetting of general-domain knowledge. We address this gap by leveraging continual learning (CL) with explicit disfluency tokens. We first introduce these tokens into a pretrained ASR model to establish stable token m...

Henri-Leon Kordt·11 days ago

Ars Technica AI· PRESS

Pokémon Go players unwittingly contributed to tech with military drone uses

The repurposing of Pokémon Go data for AI training continues to draw scrutiny.

Jeremy Hsu ·11 days ago

OpenAI· FRONTIER

New OpenAI Academy courses for the next era of work

OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.

OpenAI·11 days ago

The Verge AI· PRESS

Siri won’t be your AI girlfriend

‘Listen, that's not what I'm here for, right?' | Image: Apple Our early testing has already shown that Siri AI knows when to shut up, and that's very much by design. In an interview with Mostly Human, Craig Federighi said Apple's new Siri won't act all sycophantic like chatbots made by OpenAI, Google, and others. "As you may know, if you use many of the existing chatbots, they're really focused on engagement to a large degree," said Federighi who is responsible for software at Apple. "And sycophancy, right? They kind of want to pull you in. They might encourage you to reveal things about your...

Thomas Ricker·11 days ago

Latent Space· ANALYST

[AINews] Loopcraft: The Art of Stacking Loops

a quiet day lets us highlight a great concept from Peter Steinberger, Boris Cherny, and Andrej Karpathy

Latent Space·11 days ago

TechCrunch AI· PRESS

Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale

Avataar AI's distilled video model is priced at $0.005 for every second of generation

Ivan Mehta·11 days ago

TechCrunch AI· PRESS

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Unlike humanoid robots designed around a fixed form — think Boston Dynamics — Theker's machines are built to be reconfigured.

Anna Heim·11 days ago

TechCrunch AI· PRESS

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

The new round values the physical AI startup that aims to automate heavy engineering and drug design at $41 billion.

Marina Temkin·11 days ago

OpenAI· FRONTIER

How Preply combines AI and human tutors to personalize learning

Preply uses OpenAI to launch AI-generated lesson summaries, providing personalised feedback and language learning exercises.

OpenAI·11 days ago

Cohere· FRONTIER

North by Cohere: Now Widely Available Agentic AI Platform

North enables enterprises that prioritize data security to deploy AI agents and automations at scale within their own infrastructure.

Cohere·11 days ago

Simon Willison· ANALYST

Claude Fable is relentlessly proactive

After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive . It knows a whole lot of tricks and it will deploy pretty much any of them to get to its goal. I'll illustrate this with an example. I was hacking on Datasette Agent today when I noticed a glitch: a horizontal scrollbar that shouldn't be there in the jump menu chat prompt. I snapped this screenshot: Then I started a fresh claude session in my datasette-agent checkout, dragged in the screenshot and told it: Look at dependencies to help figure out why there is a horizontal scrollbar h...

Simon Willison·11 days ago

TechCrunch AI· PRESS

SpaceX officially prices shares at $135 in the largest IPO ever

Wits its official share pricing announcement, SpaceX's IPO has begun.

Tim Fernholz·11 days ago

Google AI (Gemma)· FRONTIER

Our new community investments in Virginia support local jobs and expand energy affordability.

We’re helping build the state’s next-generation workforce and investing in energy programs.

Google AI (Gemma)·11 days ago

TechCrunch AI· PRESS

SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift

After SpaceX makes its public debut, lower-tier SPV investors face hidden fees, lengthy payout delays, and the risk of outright fraud.

Marina Temkin·12 days ago

NVIDIA Dev Blog· INFRA

One-Click Multi-Tenant Security with NVIDIA Quantum InfiniBand

NVIDIA Quantum InfiniBand now offers intent-based security profiles in Unified Fabric Manager (UFM) that enable multi-tenant fabric security in a single... NVIDIA Quantum InfiniBand now offers intent-based security profiles in Unified Fabric Manager (UFM) that enable multi-tenant fabric security in a single click. NVIDIA Quantum InfiniBand supports three profiles: General, Bare Metal Cloud, and Secured Bare Metal Cloud. Network administrators can now auto-configure: This cuts deployment time to minutes from hours or days… Source

David Slama·12 days ago

Anthropic· FRONTIER

DXC will integrate Claude into the systems banks, airlines, and other regulated industries rely on

We’re announcing a multi-year global alliance with DXC Technology, one of the world’s largest IT services companies.

Anthropic·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated task conditions. To address this gap, we introduce EvoArena, a benchmark suite that models environment changes as sequences of progressive updates across terminal, software, and social domains. We further propose EvoMem, a patch-based memory paradigm that records memory evolutio...

Jundong Xu·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train...

Zilin Xiao·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Mana: Dexterous Manipulation of Articulated Tools

Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity and the difficulty of learning functional grasping and manipulation policies. We present Mana (Manipulation Animator), a general sim-to-real framework that reinterprets dexterous manipulation as an animation problem. Inspired by computer animation, Mana employs a coarse-to-fine pipeline that transf...

Zhao-Heng Yin·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is bounded by the action interface through which those tools are invoked. In this work, we study how the design of this interface shapes the agent's capacity for open-ended spatial reasoning. Existing spatial agents either employ single-pass code execution, which commits to a full analysis strategy be...

Seokju Cho·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Understanding Truncated Positional Encodings for Graph Neural Networks

Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based (polynomials of the adjacency matrix) - are theoretically equivalent in expressive power, with expressivity between the 1-WL and 3-WL tests. However, this equivalence assumes the GNN uses the "complete" version of these PEs, which requires $O(n^3)$ time and space complexity. Instead, practitioners commonly use truncated variants of these encodings, such as the firs...

James Flora·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Automated reproducibility assessments in the social and behavioral sciences using large language models

Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are resource-intensive and difficult to scale. Here, we show that large language models (LLMs) can automate reproducibility assessments. Using N=76 published studies with predefined claims from the behavioral and social sciences, we compare LLM-generated analysis with the original findings and human reanalysis. For 7 studies, the LLM could not produce a viable effect size esti...

Tobias Holtdirk·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Agents-K1: Towards Agent-native Knowledge Orchestration

Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges, omitting key entities, claims, evidence, mechanisms, and method lineages essential for scientific reasoning. To this end, we introduce \textbf{Agents-K1}, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs. Agents-K1 integrates three components under a unifying theoretical foundation: a multimoda...

Zongsheng Cao·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model to generate certain outputs. As an example, one might be interested in which samples in the data could be the source of toxic behavior after training the LLM. Many methods quantify this conditioning through the paradigm of influence functions. While methods of this family are effective in its funct...

Dimitri Kachler·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming context and forcing the model to manage low-level dataflow in the trace. We introduce \textbf{HyperTool}, a unified executable MCP-style tool interface that changes the model-visible unit of tool execution. A model invokes HyperTool with a code block that can call existing tool...

Yaxin Du·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we argue that the bottleneck for autonomous scientific discovery is shifting from prescribing agent workflows to designing agent environments: the resources, constraints, and interfaces that shape agent behavior. We frame this as environment engineering: building environments that ...

Amy Xin·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization

This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture important dimensions of AI's influence on individual reasoning and collective epistemic practices, System 0 occupies a theoretically distinctive position that neither can fully replicate. The paper introduces the concept of cognitive colonization, according to which AI systems can embed external interests within the architecture of the self in ways that are difficult for users...

Marianna Bergamaschi Ganapini·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

On-policy distillation (\textsc{OPD}) has recently become a prominent post-training recipe as it combines two desirable ingredients: on-policy student trajectories and dense teacher supervision, yet how this hybrid changes a model's parameters remains unclear. Across several language and vision-language model pairs and use cases, our analysis yields two main findings. On sparsity, \textsc{OPD}-style updates are small and coordinate-sparse. They are distributed across layers and are usually FFN-heavy. This sparse structure is operationally useful: training only the discovered subnetwork recove...

Guo Yu·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

Detecting LLM reasoning failures at inference time without ground-truth labels has motivated a wide range of confidence baselines, including self-consistency, semantic entropy, and P(True), built on within-question sampling and self-evaluation. Operad theory, the formalism for systems built by iterated substitution, suggests a complementary diagnostic: a model's direct answer to a compositional query should agree with the answer it produces by composing a stated decomposition of the same query. We instantiate this idea as operadic consistency (OC), a per-question signal. Across twelve instruc...

Nathaniel Bottman·12 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual benchmark coverage for Slovak. Our evaluation of 31 embedding models reveals that large instruction-tuned multilingual models achieve the strongest performance, while existing Slovak-specific models trained for NLU tasks transfer poorly to embedding tasks. To address the need for efficient, locally-deployable Slovak embeddings, we develop \texttt{e5-sk-small} (45M para...

Marek Šuppa·12 days ago

← Front Page30 stories

← Newer Older →