The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

We shed light on OpenAI's first Dark Factory for the first time.

Latent Space·2 months ago

Anthropic’s New TPU Deal, Anthropic’s Computing Crunch, The Anthropic-Google Alliance

Google supplies TPUs to Anthropic to resolve compute bottleneck, consolidating strategic alliance in frontier model race.

Ben Thompson·2 months ago

Latent Space· ANALYST

[AINews] Gemma 4 crosses 2 million downloads

a quiet day lets us give due respect to the enormously successful Gemma 4 launch

Latent Space·2 months ago

Anthropic· FRONTIER

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

Anthropic expands compute partnership with Google and Broadcom for multiple gigawatts of next-generation infrastructure.

Anthropic·2 months ago

Import AI· ANALYST

Import AI 452: Scaling laws for cyberwar; rising tides of AI automation; and a puzzle over gDP forecasting

Import AI 452 reports scaling laws for cyber warfare, AI automation trends, and GDP forecasting anomalies.

Jack Clark·2 months ago

OpenAI· FRONTIER

Announcing the OpenAI Safety Fellowship

OpenAI announces Safety Fellowship pilot program to fund independent AI safety and alignment research and develop next-generation researchers.

OpenAI·2 months ago

OpenAI· FRONTIER

Industrial policy for the Intelligence Age

OpenAI proposes people-first industrial policy for AI era focused on expanding opportunity, distributing prosperity, building resilient institutions.

OpenAI·2 months ago

Meta AI· FRONTIER

How Alta Daily Uses Meta’s Segment Anything to Reimagine the Digital Closet

Alta Daily case study deploying Meta's Segment Anything for e-commerce wardrobe application.

Meta AI·2 months ago

Latent Space· ANALYST

[AINews] Good Friday

a quiet day.

Latent Space·3 months ago

Latent Space· ANALYST

Marc Andreessen introspects on The Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"

The legend needs no intro... if you pardon our pun

Latent Space·3 months ago

Latent Space· ANALYST

[AINews] Gemma 4: The best small Multimodal Open Models, dramatically better than Gemma 3 in every way

A welcome update from Google!

Latent Space·3 months ago

NVIDIA Dev Blog· INFRA

Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight

In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU... In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU scheduling. In the previous post, Build High-Performance Vision AI Pipelines with NVIDIA CUDA-Accelerated VC-6, this was described as the data-to-tensor gap—a performance mismatch between AI pipeline stages. The SMPTE VC-6 (ST 2117-1) codec… Source

Andreas Kieslinger·3 months ago

Latent Space· ANALYST

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

We cap out our World Models coverage with one of the most exciting new approaches - long running, multiplayer, interactive world models built with agents bootstrapped from game engines!

Latent Space·3 months ago

NVIDIA Dev Blog· INFRA

Bringing AI Closer to the Edge and On-Device with Gemma 4

The Gemmaverse expands with the launch of the latest Gemma 4 multimodal and multilingual models, designed to scale across the full spectrum of deployments, from... The Gemmaverse expands with the launch of the latest Gemma 4 multimodal and multilingual models, designed to scale across the full spectrum of deployments, from NVIDIA Blackwell in the data center to Jetson at the edge. These models are suited to meet the growing demand for local deployment for AI development and prototyping, secure on-prem requirements, cost efficiency, and latency-sensitive use… Source

Anu Srivastava·3 months ago

Google DeepMind· FRONTIER

Gemma 4: Byte for byte, the most capable open models

Google DeepMind releases Gemma 4, open-weights model optimized for reasoning and agentic workflows with improved capability-to-size ratio.

Google DeepMind·3 months ago

Google AI (Gemma)· FRONTIER

New ways to balance cost and reliability in the Gemini API

Gemini API introduces Flex and Priority inference tiers for cost/latency tradeoffs.

{"$":{"xmlns:author":"http://www.w3.org/2005/Atom"},"name":["Lucia Loher"],"title":["Product Manager"],"department":["Gemini API"],"company":[""]}·3 months ago

Google AI (Gemma)· FRONTIER

Create, edit and share videos at no cost in Google Vids

Google Vids adds free video generation via Lyria 3 and Veo 3.1; consumer video editing tool.

{"$":{"xmlns:author":"http://www.w3.org/2005/Atom"},"name":["David Nachum"],"title":["Group Product Manager"],"department":["Google Vids"],"company":[""]}·3 months ago

NVIDIA Dev Blog· INFRA

Achieving Single-Digit Microsecond Latency Inference for Capital Markets

In algorithmic trading, reducing response times to market events is crucial. To keep pace with high-speed electronic markets, latency-sensitive firms often use... In algorithmic trading, reducing response times to market events is crucial. To keep pace with high-speed electronic markets, latency-sensitive firms often use specialized hardware like FPGAs and ASICs. Yet, as markets grow more efficient, traders increasingly depend on advanced models such as deep neural networks to enhance profitability. Because implementing these complex models on low-level… Source

Nikolay Markovskiy·3 months ago

OpenAI· FRONTIER

OpenAI acquires TBPN

OpenAI acquires TBPN to expand dialogue with AI builders, businesses, and broader tech community through independent media support.

OpenAI·3 months ago

OpenAI· FRONTIER

Codex now offers more flexible pricing for teams

OpenAI introduces pay-as-you-go pricing model for Codex on ChatGPT Business and Enterprise to lower adoption barriers for teams.

OpenAI·3 months ago

Latent Space· ANALYST

[AINews] A quiet April Fools

a quiet day

Latent Space·3 months ago

Hugging Face· INFRA

Welcome Gemma 4: Frontier multimodal intelligence on device

Hugging Face·3 months ago

NVIDIA Dev Blog· INFRA

CUDA Tile Programming Now Available for BASIC!

Note: CUDA Tile Programming in BASIC is an April Fools’ joke, but it's also real and actually works, demonstrating the flexibility of CUDA. CUDA 13.1... Note: CUDA Tile Programming in BASIC is an April Fools’ joke, but it’s also real and actually works, demonstrating the flexibility of CUDA. CUDA 13.1 introduced CUDA Tile, a next generation tile-based GPU programming paradigm designed to make fine-grained parallelism more accessible and flexible. One of its key strengths is language openness: any programming language can target CUDA Tile… Source

Rob Armstrong·3 months ago

NVIDIA Dev Blog· INFRA

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design

Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak... Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak chip specifications. Rigorous AI inference performance benchmarks are critical to understanding real-world token output, which drives AI factory revenue. MLPerf Inference v6.0 is the latest in a series of industry benchmarks that measure… Source

Ashraf Eassa·3 months ago

NVIDIA Dev Blog· INFRA

Accelerate Token Production in AI Factories Using Unified Services and Real-Time AI

In today’s AI factory environment, performance is not theoretical. It is economic, competitive, and existential. A 1% drop in usable GPU time can mean... In today’s AI factory environment, performance is not theoretical. It is economic, competitive, and existential. A 1% drop in usable GPU time can mean millions of tokens lost per hour. Minutes of congestion can cascade into hours of recovery. A rack-level power oversubscription can lead to stranded power and reduced tokens per watt, silently eroding factory output at scale. As AI factories scale… Source

Pradyumna Desale·3 months ago

Google AI (Gemma)· FRONTIER

We’re creating a new satellite imagery map to help protect Brazil’s forests.

Google partnered with Brazil on satellite imagery forest monitoring system; applied AI use case.

Google AI (Gemma)·3 months ago

Google AI (Gemma)· FRONTIER

The latest AI news we announced in March 2026

March 2026 recap of Google AI announcements; meta-summary lacks specific technical or strategic substance.

{"$":{"xmlns:author":"http://www.w3.org/2005/Atom"},"name":["The Keyword Team"],"title":[""],"department":[""],"company":[""]}·3 months ago

Hugging Face· INFRA

Falcon Perception

Hugging Face·3 months ago

Latent Space· ANALYST

[AINews] The Claude Code Source Leak

The accidental "open sourcing" of Claude Code brings a ton of insights.

Latent Space·3 months ago

OpenAI· FRONTIER

Gradient Labs gives every bank customer an AI account manager

Gradient Labs deploys GPT-4.1 and GPT-5.4 mini/nano agents for automated banking support with low-latency agentic workflows.

OpenAI·3 months ago

← Front Page30 stories

← Newer Older →