Vol. I · No. 61FRI, JUN 19, 2026
Archive

The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel

In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all,... In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all, but due to its dynamics and sparseness (only topk experts per AI token instead of all experts), it’s challenging to implement and optimize. This post details an efficient MoE EP communication solution, Hybrid-EP, and its use in the… Source

·

xAI joins SpaceX

SpaceX acquires xAI, consolidating AI development with rocket/hardware infrastructure.

·

Introducing the Codex app

OpenAI releases Codex app for macOS enabling parallel multi-agent coding workflows with long-running task support.

·

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things... NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things about CUDA Tile is that you can build your own DSL on top of it. This post shares the work NVIDIA is doing to integrate CUDA Tile as a backend for OpenAI Triton, an open source Python DSL designed to write DL kernels for GPUs. Source

·

Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor

Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing,... Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing, signal processing, and deep learning due to their efficiency in storage, computation, and power. Despite their benefits, handling sparse tensors manually or through existing libraries is often cumbersome, error-prone, nonportable… Source

·

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a... AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a significant, often overlooked, attack surface by running tools from the command line with the same permissions and entitlements as the user, making them computer use agents, with all the risks those entail. The primary threat to these tools is… Source

·

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to... NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to Kubernetes clusters. This capability, built on the open source KAI Scheduler that powers NVIDIA Run:ai, addresses a long-standing challenge in shared GPU infrastructure. Consider two teams with equal priority sharing a cluster. Source

·

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It... This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It dynamically selects the CP size per microbatch to efficiently handle variable-length sequences, achieving up to 1.48x speedup on real-world datasets. In large-scale model training, an often-overlooked bottleneck arises from the… Source

·

Updating Classifier Evasion for Vision Language Models

Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For... Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For instance, vision language models (VLMs) can generate output from combined image and text input, enabling developers to build systems that interpret graphs, process camera feeds, or operate with traditionally human interfaces like desktop… Source

·

Grok Imagine API

Grok Imagine API offers video generation with stated advances in quality, cost, latency.

·

Accelerating Diffusion Models with an Open, Plug-and-Play Offering

Recent advances in large-scale diffusion models have revolutionized generative AI across multiple domains, from image synthesis to audio generation, 3D asset... Recent advances in large-scale diffusion models have revolutionized generative AI across multiple domains, from image synthesis to audio generation, 3D asset creation, molecular design, and beyond. These models have demonstrated unprecedented capabilities in producing high-quality, diverse outputs across various conditional generation tasks. Despite these successes… Source

·
30 stories