The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Bounding the Black Box: A Statistical Certification Framework for AI Risk Regulation

Framework quantifies acceptable risk thresholds for high-risk AI systems under EU AI Act, NIST, and Council of Europe regulations.

Natan Levy·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Beyond Expected Information Gain: Stable Bayesian Optimal Experimental Design with Integral Probability Metrics and Plug-and-Play Extensions

Bayesian Optimal Experimental Design framework using integral probability metrics replaces KL divergence for stable information gain estimation.

Di Wu·21 days ago

r/OpenAI· COMMUNITY

If Bible characters had Instagram

Speculative social media parody post unrelated to AI technology, models, research, or industry developments.

u/AskSquibbDoOwl·21 days ago·75 pts / 10 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication

TraceScope sandboxed agent triage system navigates interactive phishing pages (checkboxes, delayed rendering) for forensic URL classification.

Haolin Zhang·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

Study shows neural network representational convergence across architectures and modalities using Procrustes analysis, linking to brain alignment.

Eghbal A. Hosseini·21 days ago

r/ClaudeAI· COMMUNITY

My Claude trying to find out who its competitors are

Anecdotal observation of Claude responding to competitor AI usage in chat.

u/Typical-Counter-5389·21 days ago·108 pts / 36 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

GFlowState: Visualizing the Training of Generative Flow Networks Beyond the Reward

GFlowState visual analytics system interprets Generative Flow Networks training dynamics for molecular and material discovery applications.

Florian Holeczek·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Alignment has a Fantasia Problem

Paper identifies 'Fantasia interactions' where users engage AI systems with incomplete goals, proposing realignment of alignment research beyond prompt-intent matching.

Nathanael Jo·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

On the algebra of Koopman eigenfunctions and on some of their infinities

Mathematical framework for computing Koopman operator eigenfunctions in reversible dynamical systems via polynomial construction.

Zahra Monfared·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

Tool Attention framework reduces Model Context Protocol overhead (10k-60k tokens) via dynamic gating and lazy schema loading for LLM agent scaling.

Anuj Sadani·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos

Introduces diagnosis-driven capsule endoscopy video summarization task with clinician-inspired context extraction for sparse medical event detection.

Bowen Liu·21 days ago

r/singularity· COMMUNITY

Happy smarter base model day

Vague celebratory post lacking specific claims or identifiable model announcement.

u/Glittering-Neck-2505·21 days ago·114 pts / 55 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

Learning-theoretic framework for consensus elicitation on deliberation platforms using opinion space embeddings and hypothesis interval maximization.

Carter Blair·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Quotient-Space Diffusion Models

Quotient-Space Diffusion Models framework formalizes generative modeling with symmetries, applied to 3D molecular structure generation.

Yixian Xu·21 days ago

TechCrunch AI· PRESS

Era raises $11M to build a software platform for AI gadgets

Era thinks that we will see many form factors of AI hardware, including glasses, rings, and pendants

Ivan Mehta·21 days ago

r/LocalLLaMA· COMMUNITY

US gov memo on “adversarial distillation” - are we heading toward tighter controls on open models?

US OSTP memo warns of adversarial distillation attacks on proprietary models; raises questions about regulatory impact on open-weight development.

u/MLExpert000·21 days ago·272 pts / 314 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery

SyMTRS synthetic dataset for aerial imagery tasks: depth estimation, domain adaptation, super-resolution with multi-scale paired data.

Safouane El Ghazouali·21 days ago

r/Anthropic· COMMUNITY

A group of users leaked Anthropic's AI model Mythos by reportedly guessing where it was located

The AI model that Anthropic billed as too dangerous to release has reportedly been accessed by an unauthorized third party, and the incident raises concerns about the future of cybersecurity. The Mythos model was reportedly accessed by a handful of users in a private Discord chat on the day it was announced publicly, Bloomberg reported. Earlier this month, the group was able to access the program in part because one of the members of the group is a third party contractor for Anthropic, according to Bloomberg. Using this access, the group was able to guess where the model was located based ...

u/fortune·21 days ago·14 pts / 3 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

An effective variant of the Hartigan $k$-means algorithm

Minor algorithmic variant of Hartigan k-means improves clustering 2-5% over standard method, gains larger with higher dimension/k.

François Clément·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

DiffMAS framework enables joint optimization of latent communication protocols in multi-agent LLM systems via differentiable training.

Ye Yu·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Inferring High-Level Events from Timestamped Data: Complexity and Medical Applications

Logic-based temporal event detection system infers high-level medical events from timestamped clinical data and background rules.

Yvon K. Awuklu·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Compliance Moral Hazard and the Backfiring Mandate

Mechanism design framework addresses decentralized risk analytics and compliance moral hazard in banking AML networks via temporal value assignment.

Jian Ni·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

SemEval-2026 Task 4: Narrative Story Similarity and Narrative Representation Learning

SemEval-2026 Task 4 introduces NSNRL benchmark for narrative similarity classification and embedding representation evaluation on 1000+ story triples.

Hans Ole Hatzel·21 days ago

r/Anthropic· COMMUNITY

Unusable after token usage nerfs

Moving somewhere else next billing cycle. Two hours of coding on Max and I'm capped. Whatever you changed, change it back or say something. Silent nerfs to a paid product are a bad look.

u/DependentAioli48·21 days ago·10 pts / 8 comm

r/ClaudeAI· COMMUNITY

Claude Status Update : Elevated errors on Claude Opus 4.7 on 2026-04-23T15:29:04.000Z

Automatic status alert: Opus 4.7 experienced elevated error rates on 2026-04-23.

u/ClaudeAI-mod-bot·21 days ago·24 pts / 15 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

LLM leaderboards are widely used to compare models and guide deployment decisions. However, leaderboard rankings are shaped by evaluation priorities set by benchmark designers, rather than by the diverse goals and constraints of actual users and organizations. A single aggregate score often obscures how models behave across different prompt types and compositions. In this work, we conduct an in-depth analysis of the dataset used in the LMArena (formerly Chatbot Arena) benchmark and investigate this evaluation challenge by designing an interactive visualization interface as a design probe. Our...

Minji Jung·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Misinformation Span Detection in Videos via Audio Transcripts

Online misinformation is one of the most challenging issues lately, yielding severe consequences, including political polarization, attacks on democracy, and public health risks. Misinformation manifests in any platform with a large user base, including online social networks and messaging apps. It permeates all media and content forms, including images, text, audio, and video. Distinctly, video-based misinformation represents a multifaceted challenge for fact-checkers, given the ease with which individuals can record and upload videos on various video-sharing platforms. Previous research eff...

Breno Matos·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA

Existing audio question answering benchmarks largely emphasize sound event classification or caption-grounded queries, often enabling models to succeed through shortcut strategies, short-duration cues, lexical priors, dataset-specific biases, or even bypassing audio via metadata and captions rather than genuine reasoning Thus, we present AUDITA (Audio Understanding from Diverse Internet Trivia Authors), a large-scale, real-world benchmark to rigorously evaluate audio reasoning beyond surface-level acoustic recognition. AUDITA comprises carefully curated, human-authored trivia questions ground...

Tasnim Kabir·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

PrismaDV: Automated Task-Aware Data Unit Test Generation

Data is a central resource for modern enterprises, and data validation is essential for ensuring the reliability of downstream applications. However, existing automated data unit testing frameworks are largely task-agnostic: they validate datasets without considering the semantics and requirements of the code that consumes the data. We present PrismaDV, a compound AI system that analyzes downstream task code together with dataset profiles to identify data access patterns, infer implicit data assumptions, and generate task-aware executable data unit tests. To further adapt the data unit tests ...

Hao Chen·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Thinking with Reasoning Skills: Fewer Tokens, More Accuracy

Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills distilled from extensive deliberation and trial-and-error exploration, and to retrieve these skills at inference time to guide future reasoning. Unlike the prevailing \emph{reasoning from scratch} paradigm, our approach first recalls relevant skills for each query, helping the model avoid redundant detours and focus on effective solution paths. We evaluate our method on coding and mathematical reason...

Guangxiang Zhao·21 days ago

← Front Page30 stories

← Newer Older →