Introducing Claude Opus 4.8
An upgrade to our Opus class of models, with stronger performance across coding, agentic tasks, and professional work, and the consistency to handle long-running work.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
An upgrade to our Opus class of models, with stronger performance across coding, agentic tasks, and professional work, and the consistency to handle long-running work.
Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model's "honesty." According to Anthropic, it trains "all [its] models to be honest - for instance, to avoid making claims that they can't support." But it notes that "a general problem with AI models is that they sometimes jump to conclusions, confidently presenting their work as making progress despite thin evidence." The AI lab claims that early testers have found that Opus 4.8 "is more likely to flag uncertainties about its work and less likely to make unsupported claims." In the company's evaluations, Opus...
We use a mean-field-based transformer model to theoretically investigate how auxiliary variables, such as positional encoding, prevent mode collapse of self-attention mechanisms. The use of mean-field transformers to analyze the properties of self-attention mechanisms has garnered significant attention in recent years due to their ability to comprehensively analyze token interactions. However, analysis of this simple model suggests that mode collapse, where token distributions degenerate to a single point, occurs during long inferences (i.e., many layers), indicating a discrepancy with realit...
While Multi-Agent Systems (MAS) empower Large Language Models to tackle complex reasoning tasks through collaborative interaction, optimizing their dynamics remains a formidable challenge due to the discrete, non-differentiable nature of the computation graph and the sparsity of global supervisory signals. Existing black-box optimizers struggle to attribute trajectory-level failure to specific local components, resulting in inefficient, high-variance exploration. We argue that tractable MAS optimization needs structural inductive biases to disentangle error signals. We propose temporal and st...
Vision-Language-Action (VLA) models have emerged as a promising paradigm for grounding visual-language understanding into real-world robotic manipulation. However, dexterous manipulation remains challenging for VLA policies due to high-dimensional hand control and compounding execution errors, which makes real-world RL post-training essential for bridging the gap between visually grounded action generation and physically reliable dexterous execution. However, high-dimensional dexterous exploration often triggers temporal inconsistency, sample inefficiency and hardware risks in the real world....
Clustering is an unsupervised technique for grouping data points by similarity. While explainability methods exist for supervised machine learning, they are not directly applicable to clustering, making it challenging to understand cluster assignments. This interpretability gap is particularly evident in the popular density-based method DBSCAN, which assigns points as inliers (cluster members in dense regions) or outliers (noise points in sparse regions). DBSCAN does not provide insight into why a particular point receives its assignment or whether its assignment is robust to small changes in...
We introduce TriSearch, a reinforcement learning framework for optimizing objectives over triangulations of a polytope via bistellar flips. The key idea is a circuit-supported subtriangulation action representation: feasible flips are encoded by their supporting circuit and realized local subtriangulation, enabling a learned policy to rank them using local geometric and combinatorial features. This yields a dimension-agnostic interface and enables efficient traversal of the flip graph without explicit enumeration of the full triangulation space. Instantiated in 3D and 4D, TriSearch generalize...
We’re upgrading Claude Opus to a new version: Claude Opus 4.8. It builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today for the same price. In Claude Code, you can hand off a feature, a migration, or a bug sweep and let it follow the work through while you focus on what’s next. Also launching today: * Fast mode for Opus 4.8 (research preview). Same model at roughly 2.5x the speed, now three times cheaper than before. * Dynamic workflows in Claude Code (research preview). Claude ...
Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge as \textbf{Contextual Belief Management (CBM)}: maintaining a predicted belief state aligned with formal evidence while isolating task-irrelevant noise. To make CBM measurable, we introduce BeliefTrack, a closed-world benchmark spanning Rule Discovery and Circuit Diagnosis, where a finite belief space and symbolic verifiers enable exact turn-level evaluation. BeliefTrack diagnoses three failures: Failed Stay...
Temperature-zero BF16 LLM inference is often treated as reproducible, yet the same request can emit different tokens when decoded alone or inside a larger batch. Existing fixes use batch-invariant operators or LLM-42's per-token verification, incurring cost even when most steps are stable. We ask whether verification can be applied exclusively to flipped tokens. Across five models, batch-induced token flips are sparse on the flip-rate benchmarks: on MATH500, Llama-3.1-8B flips on $0.48\%$ of synchronous decode steps, and all tested models stay within the 0.3-1.3% range on MATH500, GSM8K, and ...
I was at 88% last night and woke up until 4pm to optimize my agents so I can work during the weekend. But after waking up, my usage is all 0 now, I checked in the app, on the web, all showing zero. Did AI God grant me a wish? Edit: wow Opus 4.8 is here, AI God really grant us all a wish
Third-person singular pronouns have long been used to study stereotypical biases in language models and to test their abilities to reason about reference. More recently, the interplay between reasoning and bias has been investigated with the task of pronoun fidelity, which assesses models' abilities to correctly reuse a previously-specified pronoun for a discourse entity, independent of other potentially distracting discourse entities mentioned in between. However, such research focuses on English, which is a language with limited grammatical gender and almost no gender agreement. In this pap...
Continuous-time models are a natural choice for irregular and asynchronous data. A central design choice is how to embed discrete observations into continuous time. Interpolation- and imputation-based embeddings reconstruct a continuous observation path, making the model sensitive to the choice of reconstruction. We show that this reconstruction step is unnecessary; under mild conditions, compact-set universality on the model input space transfers to the data space whenever the embedding from data to input is continuous and injective. Guided by this result, and building on the rectilinear con...
https://preview.redd.it/ijwlm2f2pw3h1.png?width=2536&format=png&auto=webp&s=9ed960f06a4f3f077d05a8557059e5534b2d1ab5 It looks like the new CC release will have opus 4.8 1M to be released anytime! I wonder if it is based of of mythos?
looks like you can run it on any potato (A1B)! [https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF) from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning. * **On-device personal assistant**: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices. * **Compressed performance**: Competitive with much larger dense and MoE models on instruction following and agen...
Next month's Tribeca Festival will include the premiere of an AI-generated film: Dreams of Violets. The 75-minute film is a fictional dramatization of the Iranian government's mass killing of protestors in January, with the people and images fully created by AI, as reported earlier by The Hollywood Reporter. Dreams of Violets cost $2,000 to make and is "based on journalistic reports, photographs, and eyewitness accounts," according to a press release. It was created by Ash and Pooya Koosha, two brothers who left Iran in 2009. Pooya co-founded Fountain 0, the company behind the film, while Ash...
New features coming to YouTube could make it better for listening to podcasts, rolling out to Premium subscribers starting today on Android and coming later to iOS. A new "on-the-go mode" shifts YouTube into an audio-first layout, with larger, simplified playback buttons, a still image in place of the video, and a timeline showing video chapters. YouTube says you can turn on this new mode in a video's settings - a pop up will also appear if YouTube detects you're moving around while watching a video. If you like to speed up your podcasts to get through episodes faster, YouTube's new auto spee...
Elon Musk is publicly reframing xAI’s massive Anthropic compute deal as short-term and cancellable, despite SpaceX’s own S-1 filing describing payments through May 2029.
Sesame’s new iOS app brings its conversational AI agents to the public, offering more natural back-and-forth interactions designed to feel less like traditional chatbots and more like talking to a person.
Going through the credit card statement, here's what I had active: Claude Pro (40), ChatGPT Plus (20), Cursor (20), Perplexity Pro (20), Notion AI (10), Granola (20), ElevenLabs Starter (5), Midjourney Basic (10), Gamma Pro (10), Beautiful.ai (12), Otter Pro (17), Loom Business (15), Zapier Pro (30), Make Core (10), Tactiq Pro (8), Descript Creator (15), Reclaim.ai Pro (8), Motion (19), Superhuman (30), one i can't remember the name of (10), some ai-something for instagram captions (11) Then I sat down and wrote next to each one the last time I'd actually used it. Not opened it, used it for...
Here are 12 of the biggest Google I/O 2026 keynote moments, including news about Gemini Omni, Gemini 3.5 Flash and more.
Apple's long-awaited Siri overhaul, expected to arrive in iOS 27, might look a lot like ChatGPT with a splash of Liquid Glass. Renders from Bloomberg offer a preview of iOS 27, including the new app and chat interface for Siri. The renders are "based on information viewed by Bloomberg and people with knowledge of [Apple's] plans," and could differ from Apple's final designs, which Bloomberg's Mark Gurman says Apple will reveal at WWDC in June. The images show a new pill-shaped Siri chat bubble popping out of the Dynamic Island with a drop-down menu containing options for Ask, Siri, and ChatGP...
Niche use case but I cook a lot and I've been trying to use AI tools for it consistently. Honest writeup. Works: Asking for substitutions when I'm missing an ingredient. Reliable. Tells me what to swap and why. Scaling recipes up or down with non-trivial math (recipe serves 4, I need 7 servings, what are the new quantities). Faster than I'd do it myself. Cleaning up a recipe from a website where the actual instructions are buried under 4,000 words of SEO content. Paste the URL or text, get just the recipe. Worth it for this alone. Building shopping lists from a week of planned recipes. C...
A new crop of AI labs are focused on recursive self-improvement — but the goal is proving elusive.
Enterprise AI is entering a different phase now, one where enterprises are no longer evaluating whether AI is exciting. They are evaluating whether it is safe to deploy broadly.
The update signals YouTube's ongoing efforts to compete with other platforms for podcast audiences.
Hi! Andi from Hugging Face here! My team has been working over the last few months on creating a super smooth local experience for conversations with Reachy Mini, see the video! We hope people can extend this into tons of different cool use-cases. We wrote a blog explaining how to set this up, and how to modify it for tons of different use cases. Even if you don't have a Reachy Mini, you can use this as a roadmap for amazing voice agents: [https://huggingface.co/blog/local-reachy-mini-conversation](https://huggingface.co/blog/local-reachy-mini-conversation) Hope you enjoy it!