Section · Research

Research & Infrastructure

The infrastructure that makes frontier AI possible: Hugging Face, NVIDIA, BAIR, and the tool chains behind the models.

Hugging Face· INFRA

vLLM V0 to V1: Correctness Before Corrections in RL

Hugging Face·21 hours ago

Hugging Face· INFRA

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Hugging Face·2 days ago

NVIDIA Dev Blog· INFRA

How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car

The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and... The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and acting. In most vehicles on the road today, in-vehicle assistants still rely on fixed command-response patterns: interpret a phrase, trigger an action, reset. While effective for well-defined tasks, this approach doesn’t scale to modern… Source

Felix Friedmann·2 days ago

NVIDIA Dev Blog· INFRA

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design

Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don't... Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don’t follow a pre-determined sequence of actions. They call tools, spawn sub-agents with different tasks and models, retain information in memory, manage their own context window, and decide for themselves when they’re finished. In doing so… Source

Eduardo Alvarez·2 days ago

NVIDIA Dev Blog· INFRA

Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills

Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making.... Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making. Traditionally, specialized operations research (OR) teams solved these problems by translating business questions into mathematical models. This process can take weeks and often produces fragile solutions that struggle to adapt when conditions… Source

Adi Geva·3 days ago

NVIDIA Dev Blog· INFRA

Build AI-Powered Games with NVIDIA DLSS 4.5, RTX, and Unreal Engine 5

Today, game developers can begin integrating NVIDIA DLSS 4.5 with Dynamic Multi Frame Generation, Multi Frame Generation 6X, and the second-generation... Today, game developers can begin integrating NVIDIA DLSS 4.5 with Dynamic Multi Frame Generation, Multi Frame Generation 6X, and the second-generation transformer model for NVIDIA Super Resolution. In this post, we’ll go over new technologies and resources to share with our game-developer community, including: At CES 2026, we introduced DLSS 4.5, extending its AI-driven… Source

Phillip Singh·7 days ago

NVIDIA Dev Blog· INFRA

Speed Up Unreal Engine NNE Inference with NVIDIA TensorRT for RTX Runtime

Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches... Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches like super resolution, denoising, and neural rendering help real-time engines work more efficiently, offering new creative possibilities while keeping performance in mind. Unreal Engine 5 (UE5) has taken several steps in this direction… Source

Homam Bahnassi·7 days ago

NVIDIA Dev Blog· INFRA

How to Build, Run, and Scale High-Quality Creator Workflows in ComfyUI

Creative and visualization teams today produce more assets, in more formats, with leaner teams. Generative AI can accelerate that work – compressing tasks... Creative and visualization teams today produce more assets, in more formats, with leaner teams. Generative AI can accelerate that work – compressing tasks that once took hours of manual effort into automated, repeatable pipelines. ComfyUI is an open-source, node-based creative tool that runs locally on NVIDIA RTX GPUs. It connects image generation, video synthesis, and language models into… Source

Joel Pennington·7 days ago

NVIDIA Dev Blog· INFRA

Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl

NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and... NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and matrix multiply-accumulate—rather than manually coordinating threads, warps, and shared memory. cuTile.jl brings the same tile-based approach to the dynamic programming language Julia. Users can write custom GPU kernels without dropping… Source

Zhengyi Zhang·7 days ago

Hugging Face· INFRA

AI evals are becoming the new compute bottleneck

Hugging Face·8 days ago

NVIDIA Dev Blog· INFRA

Powering AI Factories with NVIDIA Enterprise Reference Architectures

The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and... The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and real-time decision-making at scale, competitive advantage increasingly depends on the infrastructure that supports them. Success requires more than raw compute. It demands a scalable, predictable foundation that can orchestrate intelligent… Source

Shashank Sabhlok·8 days ago

Hugging Face· INFRA

Granite 4.1 LLMs: How They’re Built

Hugging Face·8 days ago

Hugging Face· INFRA

DeepInfra on Hugging Face Inference Providers 🔥

Hugging Face·9 days ago

NVIDIA Dev Blog· INFRA

Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo

For decades, computational biology has operated under a reductionist compromise. To fit complex biological systems into the limited memory of a single GPU,... For decades, computational biology has operated under a reductionist compromise. To fit complex biological systems into the limited memory of a single GPU, researchers have had to deconstruct them into isolated fragments—single proteins or small domains. This created a context gap, where larger proteins or complexes could not be folded zero-shot due to GPU hardware memory constraints. Now… Source

Dejun Lin·9 days ago

NVIDIA Dev Blog· INFRA

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model

Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on... Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on fragmented model chains—separate stacks for vision, audio, and text. This increases inference hops and orchestration complexity, driving up inference costs while weakening cross-modal context consistency. NVIDIA Nemotron 3 Nano Omni… Source

Anjali Shah·9 days ago

Hugging Face· INFRA

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

Hugging Face·9 days ago

NVIDIA Dev Blog· INFRA

24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving

The subsurface industry is at a critical point in its digital evolution. For decades, unlocking reservoir potential has relied on experts performing essential... The subsurface industry is at a critical point in its digital evolution. For decades, unlocking reservoir potential has relied on experts performing essential and time-intensive manual workflows. As data complexity grows, the gap between machine speed and human bandwidth has become a primary bottleneck. On-demand simulation workflows are currently hampered by both manual data overhead… Source

Tsubasa Onishi·9 days ago

Hugging Face· INFRA

Adaptive Ultrasound Imaging with Physics-Informed NV-Raw2Insights-US AI

Hugging Face·10 days ago

Hugging Face· INFRA

How to build scalable web apps with OpenAI's Privacy Filter

Hugging Face·11 days ago

NVIDIA Dev Blog· INFRA

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient... DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient million-token context inference. DeepSeek-V4-Pro is the largest model in the family, with 1.6T total parameters and 49B active parameters. DeepSeek-V4-Flash is a smaller 284B-parameter model with 13B active parameters, designed for higher-speed… Source

Anu Srivastava·13 days ago

NVIDIA Dev Blog· INFRA

Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE

Federated learning (FL) is no longer a research curiosity—it’s a practical response to a hard constraint: the most valuable data is often the least movable.... Federated learning (FL) is no longer a research curiosity—it’s a practical response to a hard constraint: the most valuable data is often the least movable. Regulatory boundaries, data sovereignty rules, and organizational risk tolerance routinely prevent centralized aggregation. Meanwhile, sheer data gravity makes even permitted transfers slow, expensive, and fragile at scale. Source

Holger Roth·13 days ago

Hugging Face· INFRA

DeepSeek-V4: a million-token context that agents can actually use

Hugging Face·14 days ago

NVIDIA Dev Blog· INFRA

Winning a Kaggle Competition with Generative AI–Assisted Coding

In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground... In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground competition. Success in modern machine learning competitions is increasingly defined by how quickly you can generate, test, and iterate on ideas. LLM agents, combined with GPU acceleration, dramatically compress this loop. Historically… Source

Chris Deotte·14 days ago

Hugging Face· INFRA

How to Use Transformers.js in a Chrome Extension

Hugging Face·15 days ago

NVIDIA Dev Blog· INFRA

Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python

In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor’s sparsity from its memory layout for greater... In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor’s sparsity from its memory layout for greater flexibility and performance. We’re excited to announce the integration of the UST into nvmath-python v0.9.0 to accelerate sparse scientific and deep learning applications. This post provides a walkthrough of key UST features… Source

Aart J.C. Bik·15 days ago

NVIDIA Dev Blog· INFRA

Scaling the AI-Ready Data Center with NVIDIA RTX PRO 4500 Blackwell Server Edition and NVIDIA vGPU 20

AI integration is redefining mainstream enterprise applications, from productivity software like Microsoft Office to more complex design and engineering tools.... AI integration is redefining mainstream enterprise applications, from productivity software like Microsoft Office to more complex design and engineering tools. This shift requires the modern data center to move beyond single-purpose silos. For developers, gaining access to dedicated GPU compute can often be a bottleneck. Virtual machines (VMs) solve part of this challenge by providing secure… Source

Phoebe Lee·15 days ago

NVIDIA Dev Blog· INFRA

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved... Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (MomentUm Orthogonalized by Newton-Schulz) was used to train some of today’s best open source models, including Kimi K2 and GLM-5. Source

Hao Wu·15 days ago

Hugging Face· INFRA

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face·15 days ago

Hugging Face· INFRA

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

Hugging Face·16 days ago

Hugging Face· INFRA

How to Ground a Korean AI Agent in Real Demographics with Synthetic Personas

Hugging Face·17 days ago

Hugging Face· INFRA

AI and the Future of Cybersecurity: Why Openness Matters

Hugging Face·17 days ago

NVIDIA Dev Blog· INFRA

Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson

The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these... The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these models at the edge, enabling physical AI agents and autonomous robots to automate heavy-duty tasks. A key challenge is efficiently running multi-billion-parameter models on edge devices with limited memory. With ongoing constraints on… Source

Anshuman Bhat·17 days ago

NVIDIA Dev Blog· INFRA

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision

As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy... As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy Optimization (GRPO) power this transition, enabling reasoning-grade models to continuously improve through iterative feedback. Unlike standard supervised fine-tuning, RL training loops are bifurcated into two distinct, high-intensity phases: a… Source

Guyue Huang·17 days ago

NVIDIA Dev Blog· INFRA

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating repetitive tasks, executing tasks, writing documentation, and more. OpenAI Codex, for example, is a coding agent designed to assist developers through tasks like code generation, debugging, and automated pull request (PR) creation. Source

Daniel Teixeira·17 days ago

Berkeley BAIR· ACADEMIA

Gradient-based Planning for World Models at Longer Horizons

Berkeley BAIR proposes GRASP, a gradient-based planner for learned world models enabling longer-horizon planning via differentiable dynamics.

Berkeley BAIR·17 days ago

NVIDIA Dev Blog· INFRA

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo

Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents.... Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents. Spotify reports 650+ agent-generated PRs per month. Tools like Claude Code and Codex make hundreds of API calls per coding session, each carrying the full conversation history. Behind every one of these workflows is an inference stack under… Source

Ishan Dhanani·20 days ago

NVIDIA Dev Blog· INFRA

Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw

Agents are evolving from question-and-answer systems into long-running autonomous assistants that read files, call APIs, and drive multi-step workflows.... Source

Patrick Moorhead·20 days ago

Hugging Face· INFRA

Building a Fast Multilingual OCR Model with Synthetic Data

Hugging Face·20 days ago

NVIDIA Dev Blog· INFRA

Accelerate Clean, Modular, Nuclear Reactor Design with AI Physics

The development of socially acceptable nuclear reactors requires that they are safe, clean, efficient, economical, and sustainable. Meeting these requirements... The development of socially acceptable nuclear reactors requires that they are safe, clean, efficient, economical, and sustainable. Meeting these requirements calls for new approaches, driving growing interest in Small Modular Reactors (SMRs) and in Generation IV designs. SMRs aim to improve project economics by standardising designs and shifting construction to controlled manufacturing… Source

Mark Hobbs·20 days ago

NVIDIA Dev Blog· INFRA

How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents

Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code,... Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code, and lengthy development cycles. NVIDIA DeepStream 9 removes these development barriers using coding agents, such as Claude Code or Cursor, to help you easily create deployable, optimized code that brings your vision AI applications to… Source

Debraj Sinha·21 days ago

Hugging Face· INFRA

Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents

Hugging Face·22 days ago

Hugging Face· INFRA

The PR you would have opened yourself

Hugging Face·22 days ago

Hugging Face· INFRA

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face·22 days ago

Hugging Face· INFRA

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

Hugging Face·22 days ago

Hugging Face· INFRA

Meet HoloTab by HCompany. Your AI browser companion.

Hugging Face·22 days ago

NVIDIA Dev Blog· INFRA

Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit

For decades, computational chemistry has faced a tug-of-war between accuracy and speed. Ab initio methods like density functional theory (DFT) provide high... For decades, computational chemistry has faced a tug-of-war between accuracy and speed. Ab initio methods like density functional theory (DFT) provide high fidelity but are computationally expensive, limiting researchers to systems of a few hundred atoms. Conversely, classical force fields are fast but often lack the chemical accuracy required for complex bond-breaking or transition-state analysis. Source

Erica Tsai·23 days ago

NVIDIA Dev Blog· INFRA

NVIDIA NVbandwidth: Your Essential Tool for Measuring GPU Interconnect and Memory Performance

When you’re writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to... When you’re writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to both single-GPU and multi-GPU systems alike. One of the tools you can use to understand the memory characteristics of your GPU system is NVIDIA NVbandwidth. In this blog post, we’ll explore what NVbandwidth is, how it works… Source

Eva Sitaridi·23 days ago

NVIDIA Dev Blog· INFRA

NVIDIA Ising Introduces AI-Powered Workflows to Build Fault-Tolerant Quantum Systems

NVIDIA Ising is the world's first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising... NVIDIA Ising is the world’s first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising Decoding. Both target the fundamental challenge in quantum computing—qubits are inherently noisy. The best quantum processors make an error roughly once in every thousand operations. To become useful accelerators for scientific and… Source

Tom Lubowe·23 days ago

NVIDIA Dev Blog· INFRA

MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications

The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses,... The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses, and other complex use cases in fields such as reasoning, ML research workflows, software, engineering, and office work. The open weights release of MiniMax M2.7 is now available through NVIDIA and across the open source inference ecosystem. The MiniMax M2 series is a sparse mixture-of… Source

Anu Srivastava·26 days ago

NVIDIA Dev Blog· INFRA

Running Large-Scale GPU Workloads on Kubernetes with Slurm

Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations... Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations running large-scale AI training have years of investment in Slurm job scripts, fair-share policies, and accounting workflows. The challenge is getting Slurm scheduling capabilities onto Kubernetes—the standard platform for managing GPU… Source

Anton Polyakov·28 days ago

← Front Page50 stories