Taming Outlier Tokens in Diffusion Transformers
Study identifies outlier tokens in Diffusion Transformers that attract disproportionate attention in image generation, affecting both encoder and denoiser layers.
Fresh arrivals from arXiv cs.AI, cs.CL, and cs.LG. The raw research feed.
Study identifies outlier tokens in Diffusion Transformers that attract disproportionate attention in image generation, affecting both encoder and denoiser layers.
Research shows pretrained language models implicitly distinguish grammaticality from string probability through internal representations, despite surface statistics.
Grok AI model discovered five new mathematical inequalities and bounds in convex geometry and combinatorics, verified by human authors.
Mathematical analysis refuting Carbery's triangle inequality conjecture for Lp spaces with counterexample and sharp bounds on exponent.
LongSeeker proposes Context-ReAct paradigm for elastic context management in long-horizon search agents, maintaining trajectory at variable detail levels.
Theoretical analysis establishes sharp capacity thresholds for linear associative memory, showing d²∼n log n scaling for top-1 retrieval via phase transition.
Method estimates expected outputs of wide random MLPs without sampling by propagating activation distributions via cumulants and Hermite expansions.
Theoretical framework explains transformers' in-context learning on nonlinear regression by showing attention mechanisms construct polynomial and spline bases.
MRI-Eval benchmark with 1365 items assesses LLM performance on MRI physics and GE scanner operations with tiered difficulty and diagnostic conditions.
Q2RL algorithm extracts Q-functions from behavior cloning for efficient offline-to-online robot learning, preventing policy collapse via distribution mismatch.
Design Conductor 2.0 autonomous agent builds hardware accelerators (TurboQuant) in 80 hours using frontier April 2026 models, demonstrating 80x capability scaling over prior work.
First-token confidence (phi_first) from single greedy decode detects LLM hallucinations as effectively as multi-sample semantic self-consistency with lower computational cost.
Geometry-Aware State Space Model applies hyperbolic geometry to whole-slide histopathology image analysis via Multiple Instance Learning, improving patch aggregation for gigapixel resolution.
SemEval-2026 Task 9 system fine-tunes Gemma 3 (12B/27B) per-language with LoRA and GPT-4o-mini synthetic data augmentation for 22-language polarization detection.
Aes3D proposes aesthetic assessment framework for 3D Gaussian Splatting, addressing composition and visual appeal evaluation beyond reconstruction fidelity.
Sparse autoencoders reveal PatchTST uses non-superposed, task-specific representations for time-series forecasting, explaining competitiveness against simple linear models.
Comprehensive study of learned image compression design choices balancing perceptual quality and runtime, introducing novel techniques for practical human-visual-system-optimized codecs.
Case study of high-school/undergraduate students using AI tools for financial forecasting research, highlighting human-AI co-mentorship acceleration of learning outcomes.
Coding agent with executable Python world models, verification, and simplicity-bias refactoring solves 25 public ARC-AGI-3 games without task-specific logic.
Koopman operator theory applied to LLM embeddings as dynamical system enables low-cost black-box hallucination detection without sampling or external retrieval.
T-LVMOGP framework scales Multi-Output Gaussian Processes to high-dimensional outputs via transformed latent variables.
CausalFlow-T applies DAG-constrained normalizing flows and LLM-driven imputation for treatment effect estimation in incomplete EHR data.
Data-driven anomaly detection flags unusual patient-management actions in EHR systems to reduce clinical errors.
Adaptive policy selection method improves offline-to-online RL by combining off-policy and online evaluation under interaction budgets.
Multi-view evidential reasoning framework for mental health prediction from text with calibrated uncertainty estimation.
SHAP-based feature selection and hybrid boosting classify driving behaviors from multimodal physiological signals (EEG, EMG, GSR).
Wasserstein Gradient Flow analysis characterizes Generative Modeling via Drifting (GMD) as fixed-point optimization in probability measure space.
Analysis of LLM jailbreak vulnerability without structured prompts reveals robustness gaps in current safety defenses.
Manifold steering interventions causally link neural activation geometry to model behavior via structured representation space.
Finite-width signal propagation analysis shows when infinite-width approximation breaks down in long linear recurrences.
Proposes Prefix Sampling to optimize RL training efficiency by maintaining 50% pass rate—the regime maximizing reward signal and entropy in agentic tasks like SWE-bench.
LineRides framework enables bicycle robot to learn complex stunts via line-guided RL without demonstrations, using spatial guidelines and sparse keyframe constraints.
Framework for materials science dataset construction balancing targeted property optimization against preservation of untargeted outcomes via diversity-aware selection.
Introduces Concept Field method to detect hallucination and measure novelty in LLM outputs by modeling semantic drift in text corpora using sentence embeddings.
Unified theoretical framework for distributional regret bounds in bandits and episodic RL, with UCBVI-style algorithm achieving gap-independent guarantees.
Memini: associative memory system with multi-timescale dynamics for continual knowledge updating in deployed LLMs without explicit management.
Bayesian framework for active view selection in 3D reconstruction using posterior inference over implicit surfaces.
Doubly sparse regularization exploiting Gaussian graphical model structure for high-dimensional regression.
Driver-WM: latent world model for predicting driver reactions during L2/L3 automation transitions using in-cabin behavioral dynamics.
Think-aloud traces improve automated cognitive model discovery beyond behavior-only constraints in risky decision-making tasks.
Automated pipeline for auditing unexpected behavioral side-effects of LLM interventions through contrastive multi-token generation analysis.
Gated multimodal model combining EPC tabular data and assessor text to predict building energy efficiency scores.
ORDERED: variance reduction for unsupervised domain adaptation via optimal data reordering during training.
Imitation learning for stabilizing Vlasov-Poisson plasma control using sparse macroscopic diagnostics with stability guarantees.
Psychometric analysis of 50 LLMs identifies phenomenal experience as primary variance axis via Pinocchio dimension.
Vision-based mmWave beam management system for V2X vehicular connectivity using camera sensing and closed-loop learning.
Theoretical proof that long-context models cannot simultaneously optimize efficiency, compactness, and recall—fundamental trade-off affecting Transformers and SSMs.
Fully convolutional neural network for chemical-mechanical polishing modeling in IC manufacturing using white light interferometry.
Systematic review of jailbreak attack and defense methods for LLMs with critique of narrow evaluation metrics like attack success rate.
Adaptive deep learning framework for angle-of-arrival based outdoor localization in 5G/6G networks with flexible training strategies.