The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Introducing Claude Opus 4.8

An upgrade to our Opus class of models, with stronger performance across coding, agentic tasks, and professional work, and the consistency to handle long-running work.

Anthropic·29 days ago

The Verge AI· PRESS

Claude’s new model is more ‘honest’ when it messes up

Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model's "honesty." According to Anthropic, it trains "all [its] models to be honest - for instance, to avoid making claims that they can't support." But it notes that "a general problem with AI models is that they sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence." The AI lab claims that early testers have found that Opus 4.8 "is more likely to flag uncertainties about its work and less likely to make unsupported claims." In the company's evaluations, Opus...

Jay Peters·29 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables

We use a mean-field-based transformer model to theoretically investigate how auxiliary variables, such as positional encoding, prevent mode collapse of self-attention mechanisms. The use of mean-field transformers to analyze the properties of self-attention mechanisms has garnered significant attention in recent years due to their ability to comprehensively analyze token interactions. However, analysis of this simple model suggests that mode collapse, where token distributions degenerate to a single point, occurs during long inferences (i.e., many layers), indicating a discrepancy with realit...

Masaaki Imaizumi·29 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

While Multi-Agent Systems (MAS) empower Large Language Models to tackle complex reasoning tasks through collaborative interaction, optimizing their dynamics remains a formidable challenge due to the discrete, non-differentiable nature of the computation graph and the sparsity of global supervisory signals. Existing black-box optimizers struggle to attribute trajectory-level failure to specific local components, resulting in inefficient, high-variance exploration. We argue that tractable MAS optimization needs structural inductive biases to disentangle error signals. We propose temporal and st...

Wenwu Li·29 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Vision-Language-Action (VLA) models have emerged as a promising paradigm for grounding visual-language understanding into real-world robotic manipulation. However, dexterous manipulation remains challenging for VLA policies due to high-dimensional hand control and compounding execution errors, which makes real-world RL post-training essential for bridging the gap between visually grounded action generation and physically reliable dexterous execution. However, high-dimensional dexterous exploration often triggers temporal inconsistency, sample inefficiency and hardware risks in the real world....

Zhongxi Chen·29 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ExDBSCAN: Explaining DBSCAN with Counterfactual Reasoning -- Additional Material

Clustering is an unsupervised technique for grouping data points by similarity. While explainability methods exist for supervised machine learning, they are not directly applicable to clustering, making it challenging to understand cluster assignments. This interpretability gap is particularly evident in the popular density-based method DBSCAN, which assigns points as inliers (cluster members in dense regions) or outliers (noise points in sparse regions). DBSCAN does not provide insight into why a particular point receives its assignment or whether its assignment is robust to small changes in...

Pernille Matthews·29 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

TriSearch: Learning to Optimize Triangulations via Bistellar Flips

We introduce TriSearch, a reinforcement learning framework for optimizing objectives over triangulations of a polytope via bistellar flips. The key idea is a circuit-supported subtriangulation action representation: feasible flips are encoded by their supporting circuit and realized local subtriangulation, enabling a learned policy to rank them using local geometric and combinatorial features. This yields a dimension-agnostic interface and enables efficient traversal of the flip graph without explicit enumeration of the full triangulation space. Instantiated in 3D and 4D, TriSearch generalize...

Yiran Wang·29 days ago

r/singularity· COMMUNITY

Introducing Claude Opus 4.8

Link: anthropic.com

u/ThinkOfaNameOK·29 days ago·128 pts / 29 comm

r/Anthropic· COMMUNITY

Introducing Claude Opus 4.8

We’re upgrading Claude Opus to a new version: Claude Opus 4.8. It builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today for the same price. In Claude Code, you can hand off a feature, a migration, or a bug sweep and let it follow the work through while you focus on what’s next. Also launching today: * Fast mode for Opus 4.8 (research preview). Same model at roughly 2.5x the speed, now three times cheaper than before. * Dynamic workflows in Claude Code (research preview). Claude ...

u/ClaudeOfficial·29 days ago·141 pts / 37 comm·+ covered by others

arXiv (cs.AI/CL/LG)· ACADEMIA

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge as \textbf{Contextual Belief Management (CBM)}: maintaining a predicted belief state aligned with formal evidence while isolating task-irrelevant noise. To make CBM measurable, we introduce BeliefTrack, a closed-world benchmark spanning Rule Discovery and Circuit Diagnosis, where a finite belief space and symbolic verifiers enable exact turn-level evaluation. BeliefTrack diagnoses three failures: Failed Stay...

Haoming Xu·29 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference

Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token verification, incurring cost even when most steps are stable. We ask whether verification can be applied exclusively to flipped tokens. Across five models, batch-induced token flips are sparse on the flip-rate benchmarks: on MATH500, Llama-3.1-8B flips on $0.48\%$ of synchronous decode steps, and all tested models stay within the 0.3-1.3% range on MATH500, GSM8K, and ...

Kexin Chu·29 days ago

r/ClaudeAI· COMMUNITY

Did anyone else get a usage reset today?

I was at 88% last night and woke up until 4pm to optimize my agents so I can work during the weekend. But after waking up, my usage is all 0 now, I checked in the app, on the web, all showing zero. Did AI God grant me a wish? Edit: wow Opus 4.8 is here, AI God really grant us all a wish

u/Flimsy_Visual_9560·29 days ago·21 pts / 31 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German

Third-person singular pronouns have long been used to study stereotypical biases in language models and to test their abilities to reason about reference. More recently, the interplay between reasoning and bias has been investigated with the task of pronoun fidelity, which assesses models' abilities to correctly reuse a previously-specified pronoun for a discourse entity, independent of other potentially distracting discourse entities mentioned in between. However, such research focuses on English, which is a language with limited grammatical gender and almost no gender agreement. In this pap...

Fabian Mewes·29 days ago

r/ClaudeAI· COMMUNITY

Limits reset

Opus 4.8 is live

u/nobatus513·29 days ago·24 pts / 11 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Faithful Embeddings of Irregular and Asynchronous Data for Online Log-NCDEs

Continuous-time models are a natural choice for irregular and asynchronous data. A central design choice is how to embed discrete observations into continuous time. Interpolation- and imputation-based embeddings reconstruct a continuous observation path, making the model sensitive to the choice of reconstruction. We show that this reconstruction step is unnecessary; under mild conditions, compact-set universality on the model input space transfers to the data space whenever the embedding from data to input is continuous and injective. Guided by this result, and building on the rectilinear con...

Benjamin Walker·29 days ago

r/ClaudeAI· COMMUNITY

Opus 4.8 in the newest CC v2.1.154

https://preview.redd.it/ijwlm2f2pw3h1.png?width=2536&format=png&auto=webp&s=9ed960f06a4f3f077d05a8557059e5534b2d1ab5 It looks like the new CC release will have opus 4.8 1M to be released anytime! I wonder if it is based of of mythos?

u/PerceptionOld8565·29 days ago·22 pts / 28 comm

r/LocalLLaMA· COMMUNITY

LiquidAI/LFM2.5-8B-A1B · Hugging Face

looks like you can run it on any potato (A1B)! [https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF) from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning. * **On-device personal assistant**: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices. * **Compressed performance**: Competitive with much larger dense and MoE models on instruction following and agen...

u/jacek2023·29 days ago·49 pts / 15 comm

The Verge AI· PRESS

A $2,000 AI-generated film will make its debut at Tribeca

Next month's Tribeca Festival will include the premiere of an AI-generated film: Dreams of Violets. The 75-minute film is a fictional dramatization of the Iranian government's mass killing of protestors in January, with the people and images fully created by AI, as reported earlier by The Hollywood Reporter. Dreams of Violets cost $2,000 to make and is "based on journalistic reports, photographs, and eyewitness accounts," according to a press release. It was created by Ash and Pooya Koosha, two brothers who left Iran in 2009. Pooya co-founded Fountain 0, the company behind the film, while Ash...

Emma Roth·29 days ago

The Verge AI· PRESS

YouTube takes baby steps to being a real podcast app

New features coming to YouTube could make it better for listening to podcasts, rolling out to Premium subscribers starting today on Android and coming later to iOS. A new "on-the-go mode" shifts YouTube into an audio-first layout, with larger, simplified playback buttons, a still image in place of the video, and a timeline showing video chapters. YouTube says you can turn on this new mode in a video's settings - a pop up will also appear if YouTube detects you're moving around while watching a video. If you like to speed up your podcasts to get through episodes faster, YouTube's new auto spee...

Stevie Bonifield·29 days ago

TechCrunch AI· PRESS

How long is Anthropic’s lease with SpaceX? Opinions vary.

Elon Musk is publicly reframing xAI’s massive Anthropic compute deal as short-term and cancellable, despite SpaceX’s own S-1 filing describing payments through May 2029.

Russell Brandom·29 days ago

TechCrunch AI· PRESS

Sesame, the conversational AI startup from Oculus founders, launches its iOS app

Sesame’s new iOS app brings its conversational AI agents to the public, offering more natural back-and-forth interactions designed to feel less like traditional chatbots and more like talking to a person.

Sarah Perez·29 days ago

r/ClaudeAI· COMMUNITY

I spent $340 on AI subscriptions last month. Wrote down what I actually used each one for. It was depressing.

Going through the credit card statement, here's what I had active: Claude Pro (40), ChatGPT Plus (20), Cursor (20), Perplexity Pro (20), Notion AI (10), Granola (20), ElevenLabs Starter (5), Midjourney Basic (10), Gamma Pro (10), Beautiful.ai (12), Otter Pro (17), Loom Business (15), Zapier Pro (30), Make Core (10), Tactiq Pro (8), Descript Creator (15), Reclaim.ai Pro (8), Motion (19), Superhuman (30), one i can't remember the name of (10), some ai-something for instagram captions (11) Then I sat down and wrote next to each one the last time I'd actually used it. Not opened it, used it for...

u/OneSeaworthiness2676·29 days ago·20 pts / 31 comm

Google AI (Gemma)· FRONTIER

Catch up on 12 major I/O 2026 moments

Here are 12 of the biggest Google I/O 2026 keynote moments, including news about Gemini Omni, Gemini 3.5 Flash and more.

{"$":{"xmlns:author":"http://www.w3.org/2005/Atom"},"name":["Zahra Thompson"],"title":["Contributor"],"department":["The Keyword"],"company":[""]}·29 days ago

r/ClaudeAI· COMMUNITY

We might be getting opus 4.8 today

u/Independent-Wind4462·29 days ago·194 pts / 84 comm·+ covered by others

The Verge AI· PRESS

These new iOS 27 renders hint at Siri’s big redesign

Apple's long-awaited Siri overhaul, expected to arrive in iOS 27, might look a lot like ChatGPT with a splash of Liquid Glass. Renders from Bloomberg offer a preview of iOS 27, including the new app and chat interface for Siri. The renders are "based on information viewed by Bloomberg and people with knowledge of [Apple's] plans," and could differ from Apple's final designs, which Bloomberg's Mark Gurman says Apple will reveal at WWDC in June. The images show a new pill-shaped Siri chat bubble popping out of the Dynamic Island with a drop-down menu containing options for Ask, Siri, and ChatGP...

Stevie Bonifield·29 days ago·+ covered by others

r/ClaudeAI· COMMUNITY

8 months of using AI for cooking and meal planning. what works, what doesn't, what's surprisingly weird.

Niche use case but I cook a lot and I've been trying to use AI tools for it consistently. Honest writeup. Works: Asking for substitutions when I'm missing an ingredient. Reliable. Tells me what to swap and why. Scaling recipes up or down with non-trivial math (recipe serves 4, I need 7 servings, what are the new quantities). Faster than I'd do it myself. Cleaning up a recipe from a website where the actual instructions are buried under 4,000 words of SEO content. Paste the URL or text, get just the recipe. Worth it for this alone. Building shopping lists from a week of planned recipes. C...

u/Practical-Garden-541·29 days ago·20 pts / 14 comm

TechCrunch AI· PRESS

RSI is the new AGI — and it’s just as hard to pin down

A new crop of AI labs are focused on recursive self-improvement — but the goal is proving elusive.

Russell Brandom·29 days ago

TechCrunch AI· PRESS

At TechCrunch Disrupt 2026: Databricks’ co-founder on what kills enterprise AI deals

Enterprise AI is entering a different phase now, one where enterprises are no longer evaluating whether AI is exciting. They are evaluating whether it is safe to deploy broadly.

TechCrunch Events·29 days ago

TechCrunch AI· PRESS

YouTube adds new podcast features, including an AI recommendation tool and ‘Auto speed’

The update signals YouTube's ongoing efforts to compete with other platforms for podcast audiences.

Aisha Malik·29 days ago

r/LocalLLaMA· COMMUNITY

Reachy Mini goes fully local!

Hi! Andi from Hugging Face here! My team has been working over the last few months on creating a super smooth local experience for conversations with Reachy Mini, see the video! We hope people can extend this into tons of different cool use-cases. We wrote a blog explaining how to set this up, and how to modify it for tons of different use cases. Even if you don't have a Reachy Mini, you can use this as a roadmap for amazing voice agents: [https://huggingface.co/blog/local-reachy-mini-conversation](https://huggingface.co/blog/local-reachy-mini-conversation) Hope you enjoy it!

u/futterneid·29 days ago·70 pts / 18 comm

← Front Page30 stories

← Newer Older →