The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Mapping SQLite result columns back to their source `table.column`

Research: Mapping SQLite result columns back to their source `table.column` It would be neat if arbitrary SQL queries in Datasette could be rendered with additional information based on which columns from which tables were included in the results. To build that, we would need to be able to look at a SQL query like select users.name, orders.total from users join orders on orders.user_id = users.id and programmatically identify the table.column for each result - navigating not just joins but also more complex syntax like CTEs. I decided to set Claude Code (Opus 4.8, since Fable is currently ban...

Simon Willison·9 days ago

The Verge AI· PRESS

Amazon security research reportedly led to the White House’s Anthropic Fable ban

According to the Wall Street Journal, the export control directive that led to Anthropic cutting off access to Fable 5 and Mythos 5 was triggered in part by cybersecurity research from Amazon and conversations between CEO Andy Jassy and the White House. According to the report, the paper from Amazon claims that, through a series of prompts, it was able to get Fable 5 to serve up information that could be used in cyberattacks. Amazon has yet to respond to a request for comment. Shortly after Jassy shared the company's findings with the government, it made the call to block its use by foreign n...

Terrence O’Brien·9 days ago

TechCrunch AI· PRESS

KPMG pulls report on AI usage due to apparent hallucinations

Once again, AI proves to be an unreliable source of information about AI.

Anthony Ha·9 days ago

TechCrunch AI· PRESS

Amazon CEO reportedly raised Anthropic model concerns before government crackdown

Amazon CEO Andy Jassy may have been the source of security concerns that led Anthropic to cut off worldwide access to two models on Friday.

Anthony Ha·9 days ago

TechCrunch AI· PRESS

OpenAI faces investigation from state attorneys general

It's not clear which states are involved, but they're asking about everything from OpenAI's ad policies to its handling of health data.

Anthony Ha·9 days ago

The Verge AI· PRESS

My yard is dying, so I made an app for that

When I returned to my computer five minutes after giving Gemini a lengthy prompt, I had two things: a functional app in a preview window, and a message about a bug. "~ Channel is unrecoverably broken and will be disposed!" Sounded bad! But right below it was a button to fix the bug. Pretty weird that I just instructed a computer to build a whole app for me with a single prompt, but it needed me to click a button to fix a bug. I did anyway, and in 233 seconds Gemini reported back that it had succeeded, using words like "blockages" and "race conditions." I didn't understand a bit of it. It was ...

Allison Johnson·10 days ago

The Verge AI· PRESS

Apple’s new AI photo editing tools mostly work, for better and worse

iPhone owners are getting real, native AI photo editing for the first time. The most popular camera in the world just got its first set of serious AI photo editing features, and I don't think any of us are ready. As far as AI photo editing goes, the new features in iOS 27 are pretty tame compared to what you can do on, say, Google's Pixel phones. But for the iPhone, they represent a tipping point in what the native photos app allows you to do to your photos. I mean memories. I mean, I don't know anymore. These new features are part of the iOS 27 developer beta right now, so bear in mind that ...

Allison Johnson·10 days ago

The Verge AI· PRESS

The future of Hollywood isn’t feeding prompts into vanilla gen AI models

Concept art from Dear Upstairs Neighbors that used to train custom builds of Google’s Veo and Imagen models. | Image: Google DeepMind For all the noise that's been made about how generative AI is poised to revolutionize the filmmaking industry, there haven't really been any projects created with the technology that felt like the sort of entertainment people would pay to see. Most AI firms' video models are still only capable of churning out short bursts of visually inconsistent footage. And some of Hollywood's biggest AI partnerships have suddenly evaporated in ways that make it seem like stu...

Charles Pulliam-Moore·10 days ago

Latent Space· ANALYST

[AINews] Fable and Mythos officially too dangerous to release

We are in the strangest timeline.

Latent Space·10 days ago

TechCrunch AI· PRESS

Andrew Yang thinks the next big startup opportunity is lowering the cost of living

Andrew Yang made a list of everything Americans overpay for — housing, food, wireless — and thinks the next startup gold rush is giving that money back.

Rebecca Bellan·10 days ago

Ars Technica AI· PRESS

Anthropic shuts down Fable, Mythos models following Trump admin directive

Commerce dept. worries that a Fable 5 "jailbreak" could be a national security threat.

Kyle Orland ·10 days ago

TechCrunch AI· PRESS

Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

Anthropic isn't hiding its frustration. "We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people," the company wrote in a blog post.

Connie Loizos·10 days ago

Simon Willison· ANALYST

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

Statement on the US government directive to suspend access to Fable 5 and Mythos 5 Well this is nuts : The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Anthropic models will not be affected. We received the directive from the government toda...

Simon Willison·10 days ago·+ covered by others

Simon Willison· ANALYST

OpenAI WebRTC Audio Session, now with document context

OpenAI WebRTC Audio Session, now with document context I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio models. Last month OpenAI introduced a brand new model to that API called GPT‑Realtime‑2 , which they promoted as "our first voice model with GPT‑5‑class reasoning" - with a Sep 30, 2024 knowledge cut-off. I've been waiting for that model to show up in the ChatGPT iPhone app but it still hasn't, so I revisited my old playground. You can now pick the better model, and you can also paste in a big chunk ...

Simon Willison·10 days ago

Anthropic· FRONTIER

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive to suspend all access to Fable 5 and Mythos 5.

Anthropic·10 days ago

TechCrunch AI· PRESS

Meta’s months-old AI unit is a soul-crushing gulag, say the engineers stuck inside it

A new report suggests the unit, which employs 6,500 people, is on the verge of revolt.

Connie Loizos·10 days ago

Ars Technica AI· PRESS

SpaceX is now a public company valued for its AI potential, so what comes next?

As of today, SpaceX is owned by investors who will want to see it make money.

Eric Berger ·10 days ago

Anthropic· FRONTIER

Results from the first Anthropic Public Record

Anthropic·10 days ago

NVIDIA Dev Blog· INFRA

NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how... AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how inference systems perform under these conditions. Artificial Analysis AgentPerf (AA-AgentPerf) offers the industry’s first multi-vendor open benchmarks profiling trajectories that are representative of real-world AI agent coding tasks. Source

Eduardo Alvarez·10 days ago

Ars Technica AI· PRESS

Here's what Jeff Bezos' new startup Prometheus will do

It isn't the only startup tackling physical AI, but it's one of the best-funded.

Samuel Axon ·10 days ago

Simon Willison· ANALYST

Quoting Andrew Singleton

Jenny owns a crematorium. John’s propane company gives her a $20 billion investment in return for 5 percent of her operation. Jenny throws $10 billion into the incinerator, then pays John $10 billion to buy propane to burn that money to ashes. John reports that his AI investments have generated $10 billion in revenue this quarter and that he owns 5 percent of a $100 billion business. A reporter from Forbes is assigned to profile John and Jenny, and over the course of his research, he becomes embroiled in a passionate but confusing three-way love affair with them, which eventually turns into a...

Simon Willison·10 days ago

Ars Technica AI· PRESS

Ukraine's one-time test used fully autonomous drones to kill Russian soldiers

Full autonomy is rare, but Ukraine is installing AI modules on drones and robots.

Jeremy Hsu ·10 days ago

Anthropic· FRONTIER

TCS and Anthropic partner to bring Claude to regulated industries

We’re announcing a partnership with Tata Consultancy Services (TCS). TCS will provide Claude to 50,000 of its own employees across 56 countries; build Claude-powered products for clients in financial services, healthcare, the public sector, and other regulated industries; and join the Claude Partner Network.

Anthropic·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Gaze Heads: How VLMs Look at What They Describe

How a vision-language model internally solves the task of describing an image is far from obvious. We find that the model develops a specific mechanism for this: a small set of attention heads in its language-model backbone, which we call gaze heads, whose attention tracks the image region the model is currently describing. We find them with a simple correlation score from a few forward passes, using comic strips as a controlled testbed where narrative order is laid out spatially. These gaze heads do not just track the image tokens being described: redirecting their attention to a chosen regi...

Rohit Gandikota·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning

Building trustworthy medical multimodal large language models (MLLMs) is critical for reliable clinical decision support. Existing medical hallucination benchmarks mainly focus on data collection, but often ignore where hallucinations originate within the reasoning process. We find that hallucination sources vary across samples: errors may arise from visual misrecognition, incorrect medical knowledge recall, or flawed reasoning integration. To enable source-level hallucination diagnosis, we introduce ClinHallu, a benchmark for stage-wise hallucination diagnosis in medical MLLM reasoning. Clin...

Sicheng Yang·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Language Models (LMs) have shown remarkable potential as role-playing chatbots, delivering consistent, stylized interactions when given a specification of a character or user persona. However, applying these capabilities to real-world applications (e.g., ecosystems with numerous NPCs interacting simultaneously) exposes a critical inefficiency due to the excessive computational cost. In this paper, we question the necessity of dedicating a full, generalist model to a single persona, hypothesizing that a specific character identity relies on only a fraction of the model's total capacity. We obs...

Jinsu Kim·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization

Large reasoning models typically follow a read-then-think paradigm: they observe the complete input, reason over a static context, and then produce the answer. Yet many real-world scenarios are inherently dynamic, such as audio and video stream, where information arrives as a continuous stream and models must reason, update, and respond under partial observations. Recent streaming reasoning methods allow models to think while reading, but they largely rely on supervised imitation of pre-constructed trajectories, which limits their flexibility. In this paper, we propose AdaSR, an adaptive stre...

Junlong Tong·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning Coordinated Preference for Multi-Objective Multi-Agent Reinforcement Learning

Cooperative multi-objective multi-agent reinforcement learning (MOMARL) models team decision making under multiple, potentially conflicting objectives. In this setting, conflicts arise not only across objectives but also across agents with different observations, roles, and contributions. We propose Preference Coordinated Multi-agent Policy Optimization (PCMA), which learns coordinated agent-specific preferences to enable complementary trade-offs among agents. Theoretically, we formulate cooperative MOMARL as a team-optimal game and show that, under suitable conditions, preference diversity c...

Pengxin Wang·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment

Reinforcement learning with verifiable rewards (RLVR) has successfully elicited the reasoning capabilities of large language models, motivating its extension to multimodal scenarios. Existing methods primarily focus on improving the visual coverage of reasoning traces and mitigating visual hallucinations, but underestimate the semantic inconsistency between the reasoning process and the final answer. In this paper, we delve into thinking-answer inconsistency in RLVR for large vision-language models (LVLMs), showing thorough analyses of rollouts collected throughout Group Relative Policy Optim...

Jiayue Cao·10 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Complexity Measure for Active Learning in Multi-group Mean Estimation

We study a \emph{max-risk} objective for active learning in a multi-group mean estimation $d$-armed bandits: a learner adaptively allocates a budget of $T$ samples across $d$ groups to minimize the worst-case uncertainty index $\max_{k\in[d]}σ_k^2/n_k$, where $σ_k$ is the standard deviation of the distribution of arm $d$, and $n_k$ is the number of times arm $d$ is sampled. We develop a local minimax framework and prove the first general lower bound for this objective, valid for any finite-variance hypothesis class. The bound separates difficulty into three orthogonal factors: a \emph{budget}...

Abdellah Aznag·10 days ago

← Front Page30 stories

← Newer Older →