Research & Infrastructure
The infrastructure that makes frontier AI possible: Hugging Face, NVIDIA, BAIR, and the tool chains behind the models.
How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car
The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and... The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and acting. In most vehicles on the road today, in-vehicle assistants still rely on fixed command-response patterns: interpret a phrase, trigger an action, reset. While effective for well-defined tasks, this approach doesn’t scale to modern… Source
Building for the Rising Complexity of Agentic Systems with Extreme Co-Design
Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don't... Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don’t follow a pre-determined sequence of actions. They call tools, spawn sub-agents with different tasks and models, retain information in memory, manage their own context window, and decide for themselves when they’re finished. In doing so… Source
Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills
Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making.... Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making. Traditionally, specialized operations research (OR) teams solved these problems by translating business questions into mathematical models. This process can take weeks and often produces fragile solutions that struggle to adapt when conditions… Source
Build AI-Powered Games with NVIDIA DLSS 4.5, RTX, and Unreal Engine 5
Today, game developers can begin integrating NVIDIA DLSS 4.5 with Dynamic Multi Frame Generation, Multi Frame Generation 6X, and the second-generation... Today, game developers can begin integrating NVIDIA DLSS 4.5 with Dynamic Multi Frame Generation, Multi Frame Generation 6X, and the second-generation transformer model for NVIDIA Super Resolution. In this post, we’ll go over new technologies and resources to share with our game-developer community, including: At CES 2026, we introduced DLSS 4.5, extending its AI-driven… Source
Speed Up Unreal Engine NNE Inference with NVIDIA TensorRT for RTX Runtime
Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches... Neural network techniques are increasingly used in computer graphics to boost image quality, improve performance, and streamline content creation. Approaches like super resolution, denoising, and neural rendering help real-time engines work more efficiently, offering new creative possibilities while keeping performance in mind. Unreal Engine 5 (UE5) has taken several steps in this direction… Source
How to Build, Run, and Scale High-Quality Creator Workflows in ComfyUI
Creative and visualization teams today produce more assets, in more formats, with leaner teams. Generative AI can accelerate that work – compressing tasks... Creative and visualization teams today produce more assets, in more formats, with leaner teams. Generative AI can accelerate that work – compressing tasks that once took hours of manual effort into automated, repeatable pipelines. ComfyUI is an open-source, node-based creative tool that runs locally on NVIDIA RTX GPUs. It connects image generation, video synthesis, and language models into… Source
Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl
NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and... NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and matrix multiply-accumulate—rather than manually coordinating threads, warps, and shared memory. cuTile.jl brings the same tile-based approach to the dynamic programming language Julia. Users can write custom GPU kernels without dropping… Source
Powering AI Factories with NVIDIA Enterprise Reference Architectures
The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and... The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and real-time decision-making at scale, competitive advantage increasingly depends on the infrastructure that supports them. Success requires more than raw compute. It demands a scalable, predictable foundation that can orchestrate intelligent… Source
Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo
For decades, computational biology has operated under a reductionist compromise. To fit complex biological systems into the limited memory of a single GPU,... For decades, computational biology has operated under a reductionist compromise. To fit complex biological systems into the limited memory of a single GPU, researchers have had to deconstruct them into isolated fragments—single proteins or small domains. This created a context gap, where larger proteins or complexes could not be folded zero-shot due to GPU hardware memory constraints. Now… Source
NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model
Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on... Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on fragmented model chains—separate stacks for vision, audio, and text. This increases inference hops and orchestration complexity, driving up inference costs while weakening cross-modal context consistency. NVIDIA Nemotron 3 Nano Omni… Source
24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving
The subsurface industry is at a critical point in its digital evolution. For decades, unlocking reservoir potential has relied on experts performing essential... The subsurface industry is at a critical point in its digital evolution. For decades, unlocking reservoir potential has relied on experts performing essential and time-intensive manual workflows. As data complexity grows, the gap between machine speed and human bandwidth has become a primary bottleneck. On-demand simulation workflows are currently hampered by both manual data overhead… Source
Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints
DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient... DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient million-token context inference. DeepSeek-V4-Pro is the largest model in the family, with 1.6T total parameters and 49B active parameters. DeepSeek-V4-Flash is a smaller 284B-parameter model with 13B active parameters, designed for higher-speed… Source
Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE
Federated learning (FL) is no longer a research curiosity—it’s a practical response to a hard constraint: the most valuable data is often the least movable.... Federated learning (FL) is no longer a research curiosity—it’s a practical response to a hard constraint: the most valuable data is often the least movable. Regulatory boundaries, data sovereignty rules, and organizational risk tolerance routinely prevent centralized aggregation. Meanwhile, sheer data gravity makes even permitted transfers slow, expensive, and fragile at scale. Source
Winning a Kaggle Competition with Generative AI–Assisted Coding
In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground... In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground competition. Success in modern machine learning competitions is increasingly defined by how quickly you can generate, test, and iterate on ideas. LLM agents, combined with GPU acceleration, dramatically compress this loop. Historically… Source
Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python
In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor’s sparsity from its memory layout for greater... In a previous post, we introduced the Universal Sparse Tensor (UST), enabling developers to decouple a tensor’s sparsity from its memory layout for greater flexibility and performance. We’re excited to announce the integration of the UST into nvmath-python v0.9.0 to accelerate sparse scientific and deep learning applications. This post provides a walkthrough of key UST features… Source
Scaling the AI-Ready Data Center with NVIDIA RTX PRO 4500 Blackwell Server Edition and NVIDIA vGPU 20
AI integration is redefining mainstream enterprise applications, from productivity software like Microsoft Office to more complex design and engineering tools.... AI integration is redefining mainstream enterprise applications, from productivity software like Microsoft Office to more complex design and engineering tools. This shift requires the modern data center to move beyond single-purpose silos. For developers, gaining access to dedicated GPU compute can often be a bottleneck. Virtual machines (VMs) solve part of this challenge by providing secure… Source
Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron
Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved... Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (MomentUm Orthogonalized by Newton-Schulz) was used to train some of today’s best open source models, including Kimi K2 and GLM-5. Source
Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson
The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these... The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these models at the edge, enabling physical AI agents and autonomous robots to automate heavy-duty tasks. A key challenge is efficiently running multi-billion-parameter models on edge devices with limited memory. With ongoing constraints on… Source
Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision
As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy... As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy Optimization (GRPO) power this transition, enabling reasoning-grade models to continuously improve through iterative feedback. Unlike standard supervised fine-tuning, RL training loops are bifurcated into two distinct, high-intensity phases: a… Source
Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments
AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating repetitive tasks, executing tasks, writing documentation, and more. OpenAI Codex, for example, is a coding agent designed to assist developers through tasks like code generation, debugging, and automated pull request (PR) creation. Source
Gradient-based Planning for World Models at Longer Horizons
Berkeley BAIR proposes GRASP, a gradient-based planner for learned world models enabling longer-horizon planning via differentiable dynamics.
Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo
Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents.... Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents. Spotify reports 650+ agent-generated PRs per month. Tools like Claude Code and Codex make hundreds of API calls per coding session, each carrying the full conversation history. Behind every one of these workflows is an inference stack under… Source
Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw
Agents are evolving from question-and-answer systems into long-running autonomous assistants that read files, call APIs, and drive multi-step workflows.... Source
Accelerate Clean, Modular, Nuclear Reactor Design with AI Physics
The development of socially acceptable nuclear reactors requires that they are safe, clean, efficient, economical, and sustainable. Meeting these requirements... The development of socially acceptable nuclear reactors requires that they are safe, clean, efficient, economical, and sustainable. Meeting these requirements calls for new approaches, driving growing interest in Small Modular Reactors (SMRs) and in Generation IV designs. SMRs aim to improve project economics by standardising designs and shifting construction to controlled manufacturing… Source
How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents
Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code,... Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code, and lengthy development cycles. NVIDIA DeepStream 9 removes these development barriers using coding agents, such as Claude Code or Cursor, to help you easily create deployable, optimized code that brings your vision AI applications to… Source
Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit
For decades, computational chemistry has faced a tug-of-war between accuracy and speed. Ab initio methods like density functional theory (DFT) provide high... For decades, computational chemistry has faced a tug-of-war between accuracy and speed. Ab initio methods like density functional theory (DFT) provide high fidelity but are computationally expensive, limiting researchers to systems of a few hundred atoms. Conversely, classical force fields are fast but often lack the chemical accuracy required for complex bond-breaking or transition-state analysis. Source
NVIDIA NVbandwidth: Your Essential Tool for Measuring GPU Interconnect and Memory Performance
When you’re writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to... When you’re writing CUDA applications, one of the most important things you need to focus on to write great code is data transfer performance. This applies to both single-GPU and multi-GPU systems alike. One of the tools you can use to understand the memory characteristics of your GPU system is NVIDIA NVbandwidth. In this blog post, we’ll explore what NVbandwidth is, how it works… Source
NVIDIA Ising Introduces AI-Powered Workflows to Build Fault-Tolerant Quantum Systems
NVIDIA Ising is the world's first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising... NVIDIA Ising is the world’s first family of open AI models for building quantum processors, launching with two model domains: Ising Calibration and Ising Decoding. Both target the fundamental challenge in quantum computing—qubits are inherently noisy. The best quantum processors make an error roughly once in every thousand operations. To become useful accelerators for scientific and… Source
MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications
The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses,... The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses, and other complex use cases in fields such as reasoning, ML research workflows, software, engineering, and office work. The open weights release of MiniMax M2.7 is now available through NVIDIA and across the open source inference ecosystem. The MiniMax M2 series is a sparse mixture-of… Source
Running Large-Scale GPU Workloads on Kubernetes with Slurm
Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations... Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations running large-scale AI training have years of investment in Slurm job scripts, fair-share policies, and accounting workflows. The challenge is getting Slurm scheduling capabilities onto Kubernetes—the standard platform for managing GPU… Source