Kindergarten-grade nouns
Reddit user reports Claude Opus struggles to distinguish word obscurity via corpus frequency vs. human recognition familiarity.
Every story tagged with this topic, ordered by date.
Reddit user reports Claude Opus struggles to distinguish word obscurity via corpus frequency vs. human recognition familiarity.
SubQ claims 12M context window in marketing but production model capped at 1M; benchmark results show significant performance drop vs. research variant and competitors.
Alex Lupsasca (OpenAI) details how GPT-5.x generated novel theoretical physics and quantum gravity results.
PALACE: kernel method for certified point-cloud/graph classification with adaptive landmarks and cover-theoretic guarantees.
OpenSeeker-v2: SFT on informative trajectories achieves frontier LLM search agent capabilities without full RL pipeline.
HeadsUp: scalable feed-forward 3D Gaussian head reconstruction from multi-view captures using UV-parameterized representation.
CDS (Conditional Diffusion Sampling): combines parallel tempering and diffusion for sampling from unnormalized multimodal distributions.
Medical imaging: assorted precision training for 3D brain tumor segmentation to improve early identification.
Experience-RAG Skill introduces agent-oriented retrieval orchestration layer that learns task-specific retrieval strategies via experience memory.
Framework automates multi-agent system composition through intent-to-execution workflow and agent recommendation, replacing manual orchestration.
Flow Sampling framework uses diffusion models to sample from unnormalized densities via denoising conditional processes without data.
Hallucination detection method bridges implicit neural uncertainty and explicit self-judgments via label constraint modeling for improved reliability.
Active learning for quantum chemistry via pretrained MLIP latent space acquisition signals; domain-specific chemistry application.
Transformer architecture innovation enables selective early layer access via learned mixing coefficients for memory-efficient low-level feature recovery.
Study finds LMs can iteratively refine conceptual definitions through counterexample generation, but accept invalid counterexamples at 2× the human acceptance rate.
RCT of 356 clinicians shows atomic fact-checking (decomposing LLM recommendations into verifiable claims) increases trust from 27% to 67% vs. traditional explainability methods.
Task vector arithmetic on BEATs encoders composes 661-species bioacoustic classifier without data sharing; task vectors near-orthogonal, geometry aligns with acoustic niche hypothesis.
Framework shows popular activation steering methods misalign with prompt steering mechanics; proposes distilling prompt behavior into interpretable models to close performance gap.
Gauge-invariant GNN architecture for Abelian lattice gauge theories using Wilson loop representations; application to condensed matter and quantum systems.
Argues frontier AI failures in open-ended tasks (scientific assistance, agents, personalization) stem from objective ambiguity rather than capability gaps; proposes contextual multi-objective optimization.
Process-aware pipeline for continuous predictive monitoring of clinical pathways using prefix-based representations on COVID-19 ICU admission prediction.
Google demonstrates 3X LLM inference speedup on TPUs using diffusion-style speculative decoding technique.
Proposes improved empirical fixation density estimation methods beyond fixed-bandwidth Gaussian KDE for saliency benchmarking and per-image model evaluation.
DMGD proposes training-free dataset distillation using diffusion models with semantic-distribution matching guidance.
Study compares 2D spatiotemporal convolutions vs. concatenated 1D convolutions for EEG signal classification with CNNs.
EvoLM enables self-improvement in language models using co-evolved discriminative rubrics without external reward supervision.
MEAZO: memory-efficient adaptive zeroth-order optimizer for LLM fine-tuning, outperforms ZO-Adam with scalar-only tracking.
Distributionally robust continual learning method for CLIP models using dynamic per-class loss reweighting with small memory buffers.
Vision language models quantify semantic richness of personal visual environments to predict mental health outcomes from 2674 participant photos.
TraceLift: planner-executor framework trains LLM reasoning traces on executor-grounded rewards, not just final-answer correctness.
MCJudgeBench: benchmark for constraint-level evaluation of LLM judges in multi-constraint instruction following with per-constraint gold labels.
Mathematical framework for dependability of distributed collaborative intelligence systems where locally correct decisions compose into unsafe global behaviors.
Complex-valued gradient descent for symbolic regression enables discovery of equations with singularities and domain constraints like division and logarithms.
Randomized algorithm approximates total variation distance between mixtures of product distributions with polynomial-time complexity bounds.
Theoretical characterization of Bayes-consistency for learning with general metric losses in the realizable setting.
Conformal Predictive Self-Calibration framework for multimodal learning handles modality imbalance and noisy corruption via predictive uncertainty.
PhD student reports 4% accuracy gap when reproducing computer vision paper baseline; raises reproducibility concerns common in published ML research.
OpenAI releases GPT-5.5 Instant system card detailing model capabilities, limitations, and safety properties.
Simon Willison's April 2026 newsletter covers Opus 4.7, GPT-5.5 price increases, Claude Mythos, LLM security research, and ChatGPT Images 2.0.
FastDMS achieves 6.4× KV-cache compression on Llama 3.2 1B via learned token eviction, matching vLLM performance with lower memory overhead.
Jack Clark (Anthropic co-founder) estimates 30% probability AI research automation by end-2027, 60%+ by end-2028, citing rapid progress from coding to ML systems research.
SHAP-based framework decomposes RL algorithm and hyperparameter contributions to generalization gaps in robotic control tasks.
Layer-wise peeling framework monitors transformer training dynamics by locally optimizing each layer against intermediate representations.
JACTUS unifies parameter-efficient fine-tuning and model compression into single joint optimization framework.
Statistical approach improves Monte Carlo estimation of Shapley values and semivalues for model explainability.
SCPRM process reward model mitigates risk compensation bias in knowledge graph reasoning by enforcing schema constraints.
Framework applies reinforcement learning to multi-agent LLM systems via orchestration traces capturing spawning, delegation, and communication.
FunFuzz evolutionary fuzzing framework uses LLMs with multi-island search and feedback-driven prompt adaptation for structured input generation.
Static analysis framework for recursive SHACL shape definitions to decide constraint document implication.
Conditional VAE with latent mixture scheduling enables fine-grained topological control in graph generation for drug discovery.