Vol. I · No. 63SUN, JUN 21, 2026
Section · Research

Research & Infrastructure

The infrastructure that makes frontier AI possible: Hugging Face, NVIDIA, BAIR, and the tool chains behind the models.

Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AI

Developers building for AR glasses and wearable devices face an infrastructure gap. The hardware is ready, but creating AI experiences requires integrating live... Developers building for AR glasses and wearable devices face an infrastructure gap. The hardware is ready, but creating AI experiences requires integrating live camera and microphone streams, multimodal AI models, enterprise data, tool use, deployment infrastructure, and device-specific runtimes. NVIDIA XR AI is designed to address this challenge by providing a reusable foundation for… Source

·

Build Your Own Transaction Foundation Model for Financial Intelligence

Every swipe, transfer, and payment on a modern financial network encodes a pattern of human behavior. Transaction data is one of the richest signals an... Every swipe, transfer, and payment on a modern financial network encodes a pattern of human behavior. Transaction data is one of the richest signals an enterprise owns. Yet most production use cases for such tabular data still depend on hand-engineered features and rule sets that are brittle, expensive to maintain, and blind to the sequential structure inside a customer history. Source

·

Build On-Device AI Companions with the NVIDIA ACE Game Agent SDK and Unreal Engine 5 Plugins

NVIDIA RTX technologies are deeply integrated into Unreal Engine 5 through the NVIDIA RTX Branch of Unreal Engine and the NVIDIA DLSS Unreal Engine plugin. This... NVIDIA RTX technologies are deeply integrated into Unreal Engine 5 through the NVIDIA RTX Branch of Unreal Engine and the NVIDIA DLSS Unreal Engine plugin. This provides developers with direct access to advanced rendering, frame generation, and ray-traced lighting. NVIDIA is expanding this integration with new tools for building on-device AI characters and gameplay, as announced at Unreal Fest… Source

·

How to Optimize Transformer-Based Models for Low-Precision Training

Transformer architectures are the backbone of many modern large language and generative AI models. As these models grow in size, training runs consume more GPU... Transformer architectures are the backbone of many modern large language and generative AI models. As these models grow in size, training runs consume more GPU hours and more engineering iteration time. Accelerating transformers is therefore not just a performance optimization, but directly affects how quickly teams can experiment and how large a model they can afford to train. Source

·

NVIDIA Blackwell Tops MLPerf Training 6.0 with Industry-Leading Scale and Performance

NVIDIA delivered a clean sweep in MLPerf Training v6.0, the latest edition of industry-standard AI training benchmarks developed by the MLCommons consortium.... NVIDIA delivered a clean sweep in MLPerf Training v6.0, the latest edition of industry-standard AI training benchmarks developed by the MLCommons consortium. NVIDIA achieved the fastest time to train at scale, and also delivered the highest performance when normalized on a per-accelerator basis on every benchmark. It was also the only platform to submit on every test. Source

·

Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes

Foundation models are reshaping computational biology. Pretrained on massive corpora of protein or genomic sequences, models such as ESM2 (a protein language... Foundation models are reshaping computational biology. Pretrained on massive corpora of protein or genomic sequences, models such as ESM2 (a protein language model) and Evo 2 (a DNA language model) capture statistical regularities of biological sequences. These transfer well to a wide range of downstream tasks, including structure prediction, variant effect, and functional annotation. Source

·

Boosting MoE Training Throughput with Advanced Fusion Kernels

Mixture-of-experts (MoE) models have quickly become a foundational component of modern, large-scale AI systems. They are widely adopted because they enable... Mixture-of-experts (MoE) models have quickly become a foundational component of modern, large-scale AI systems. They are widely adopted because they enable substantially larger model capacity while activating only a subset of parameters for each token, offering an unparalleled approach for scaling performance within a practical compute budget. As model scales continue to grow… Source

·

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models

Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it... Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it to generate actions from visual observations and language instructions. Large-scale VLM pretraining is a core part of the recipe. See Pi-0 and GR00T N1. WAM World-Action Model: a policy that starts from a pretrained world-model or video… Source

·

NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how... AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how inference systems perform under these conditions. Artificial Analysis AgentPerf (AA-AgentPerf) offers the industry’s first multi-vendor open benchmarks profiling trajectories that are representative of real-world AI agent coding tasks. Source

·

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

As enterprise AI adoption scales, developers are increasingly forced to stitch together fragmented pipelines—separate models for text, vision, and... As enterprise AI adoption scales, developers are increasingly forced to stitch together fragmented pipelines—separate models for text, vision, and code—leading to added complexity, higher costs, and slower iteration. MiniMax M3—available on NVIDIA accelerated infrastructure including NVIDIA Blackwell—changes this by enabling a single multimodal system capable of long-context reasoning… Source

·

One-Click Multi-Tenant Security with  NVIDIA Quantum InfiniBand

NVIDIA Quantum InfiniBand now offers intent-based security profiles in Unified Fabric Manager (UFM) that enable multi-tenant fabric security in a single... NVIDIA Quantum InfiniBand now offers intent-based security profiles in Unified Fabric Manager (UFM) that enable multi-tenant fabric security in a single click. NVIDIA Quantum InfiniBand supports three profiles: General, Bare Metal Cloud, and Secured Bare Metal Cloud. Network administrators can now auto-configure: This cuts deployment time to minutes from hours or days… Source

·

Run DiffusionGemma on NVIDIA for Developer-Ready, High-Throughput Text Generation

Developers building real-time AI—such as chat assistants, copilots, and agentic workflows—are often constrained by token-by-token generation speed. This... Developers building real-time AI—such as chat assistants, copilots, and agentic workflows—are often constrained by token-by-token generation speed. This limits responsiveness, increases serving costs, and makes fluid, interactive experiences difficult to achieve. DiffusionGemma, created by Google DeepMind and optimized to run efficiently across NVIDIA platforms, introduces a new approach to… Source

·

Designing Production-Ready Battery Energy Storage Systems for AI Factories

AI factories are changing what data-center infrastructure must do. Unlike traditional data centers, AI factories are built to manufacture intelligence at scale.... AI factories are changing what data-center infrastructure must do. Unlike traditional data centers, AI factories are built to manufacture intelligence at scale. They run power-dense training and inference workloads, increasingly support agentic and reasoning models, and must deliver predictable performance even as compute demand shifts rapidly. In this environment… Source

·

Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise Manageability

As AI infrastructure scales, enterprise expectations for operational maturity are increasing. Organizations expect these systems to be provisionable,... As AI infrastructure scales, enterprise expectations for operational maturity are increasing. Organizations expect these systems to be provisionable, observable, secure, and manageable at scale—the same standard applied to all critical infrastructure. The moment an AI system moves from development into enterprise deployment, that operational foundation is essential. NVIDIA DGX Spark and… Source

·

Model Quantization: Turn FP8 Checkpoints into High-Performance Inference Engines with NVIDIA TensorRT

Converting a quantized checkpoint into an NVIDIA TensorRT engine bridges the gap between model optimization and production deployment, enabling faster... Converting a quantized checkpoint into an NVIDIA TensorRT engine bridges the gap between model optimization and production deployment, enabling faster inference, higher throughput, and more efficient GPU utilization at scale. In a previous post, we produced a high-quality FP8-quantized Contrastive Language-Image Pretraining (CLIP) checkpoint with NVIDIA TensorRT Model Optimizer. Source

·

Accelerating Federated Learning Research with AI Agents and NVIDIA FLARE Auto-FL

Federated learning (FL) research often begins with a deceptively simple question: What should we try next? A new aggregation rule, a FedProx coefficient, a... Federated learning (FL) research often begins with a deceptively simple question: What should we try next? A new aggregation rule, a FedProx coefficient, a server optimizer setting, a SCAFFOLD variant, or a model architecture tweak may all look promising before an experiment starts. After the run finishes, the harder questions begin: Did the change actually improve the metric? Source

·

Evaluate Clinical ASR Models Faster with Agent Skills and NVIDIA Nemotron Speech

Training a speech AI model to correctly recognize or synthesize clinical terminology is surprisingly difficult. Drug names like Acetaminophen, Amlodipine,... Training a speech AI model to correctly recognize or synthesize clinical terminology is surprisingly difficult. Drug names like Acetaminophen, Amlodipine, Cefazolin, and Biktarvy are not part of everyday vocabulary. Procedure names, anatomy terms, and specialty-specific diagnoses introduce the same problem in a different form. Off-the-shelf speech systems can sound fluent and still miss the words… Source

·

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step... Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step time can add up to days of training and substantial compute costs. Numerical precision is one of the highest-leverage knobs available, but low- bit mixed-precision pretraining is hard to get right. To address this… Source

·

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete... Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete complex workflows. However, these multi-agent workflows cause token counts to grow quickly. Agents plan, call tools, invoke sub-agents, receive information, and then pass history, outputs, and reasoning steps back into the model… Source

·
50 stories