The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

Prompt optimization for LLM-as-a-Judge evaluation on legal QA; tests transfer across Qwen3-32B and DeepSeek-V3 judges on LEXam benchmark.

Mohamed Hesham Elganayni·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Supplement Generation Training for Enhancing Agentic Task Performance

Supplement Generation Training trains smaller LLMs to generate adaptive task-specific prompts for larger models, reducing post-training costs.

Young Min Cho·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Tokenised Flow Matching for Hierarchical Simulation Based Inference

Likelihood factorization for hierarchical SBI reduces simulator costs by training neural surrogates per-site instead of multi-site sampling.

Giovanni Charles·2 months ago

r/ClaudeAI· COMMUNITY

PSA: Anthropic bans organizations without warning

Anthropic suspended ~110 users at agricultural tech company without warning; users report lack of transparency in account enforcement.

u/ur_frnd_the_footnote·2 months ago·2178 pts / 281 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

COMPASS framework uses adaptive semantic sampling with language-specific PEFT adapters to mitigate negative cross-lingual interference in multilingual LLMs.

Noah Flynn·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

ONOTE benchmark for omnimodal music notation processing across auditory, visual, symbolic domains; addresses Western notation bias and LLM judge hallucinations.

Menghe Ma·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

Textual Parameter Graph Optimization (TPGO) enables multi-agent systems to self-improve via structural parameter evolution, moving beyond flat prompt tuning.

Shan He·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Participatory provenance as representational auditing for AI-mediated public consultation

Framework for auditing whether AI-synthesized summaries of public consultation faithfully represent source populations using optimal transport and causal inference.

Sachit Mahajan·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Generative Flow Networks for Model Adaptation in Digital Twins of Natural Systems

GFlowNet-based approach for model adaptation in digital twins of evolving natural systems with partial observations and mechanistic simulators.

Pascal Archambault·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

QuanForge: A Mutation Testing Framework for Quantum Neural Networks

QuanForge mutation testing framework for Quantum Neural Networks addressing stochastic factors and quantum measurement randomness.

Minqi Shao·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

Auto-ART: structured literature synthesis of adversarial robustness field (2020-2026) plus open-source framework with 50+ attacks, 28 defenses, and Robustness Diagnostic Index.

Abhijit Talluri·2 months ago

Hugging Face· INFRA

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face·2 months ago

r/singularity· COMMUNITY

Uber blows through its IT budget for AI for 2026 and it's only April citing rising costs of Claude Code

Uber exhausts 2026 AI budget by April due to rising Claude Code inference costs.

u/kernelangus420·2 months ago·526 pts / 62 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Storm Surge Modeling, Bias Correction, Graph Neural Networks, Graph Convolution Networks

StormNet applies graph neural networks to bias-correct storm surge forecasts from ADCIRC models.

Noujoud Nader·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

MGDA-Decoupled balances multiple alignment objectives in DPO-based LLM training via geometry-aware optimization.

Andor Vári-Kakas·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales

Empirical study of 40+ transformer compression experiments on GPT-2 and Mistral 7B reveals variance-importance decoupling.

Samuel Salfati·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Intersectional Fairness in Large Language Models

Systematic fairness evaluation across six LLMs on intersectional demographic biases using benchmark datasets.

Chaima Boufaied·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Improving clinical interpretability of linear neuroimaging models through feature whitening

Feature whitening improves interpretability of linear neuroimaging models for brain biomarker discovery.

Sara Petiton·2 months ago

r/singularity· COMMUNITY

GPT Image 2 is the first image ai that’s blown my mind (prompted for a screenshot from a combined GTA 6-Cyberpunk 2077 game)

User subjective impressions of GPT Image 2 output quality combining GTA 6 and Cyberpunk 2077 aesthetics.

u/LoonieMoony·2 months ago·445 pts / 70 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

A Field Guide to Decision Making

Framework for high-consequence decision-making augmented by machine intelligence and agentic metadata stewardship.

Richard B. Arthur·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation

ORPHEAS is a specialized bilingual Greek-English embedding model for cross-lingual RAG applications.

Ioannis E. Livieris·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

Critical analysis of trustworthiness in Vision-Language Models, exposing functional blindness and language prior exploitation.

Karan Goyal·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning

GRPO-VPS enhances reasoning via verifiable process supervision and belief-probing for improved credit assignment.

Jingyi Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

Benchmark of 35 open-weight LLMs shows behavioral economics games predict multi-agent team coordination in AI science workflows.

Shivani Kumar·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

Preregistered study of 7 LLMs finds they resist motivated investor pressure in fraud detection, contrary to prediction, across 3,360 conversations.

Nattavudh Powdthavee·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

CHORUS: An Agentic Framework for Generating Realistic Deliberation Data

CHORUS: agentic framework using LLM-powered personas with behavioral consistency to generate large-scale deliberation datasets for online discourse analysis.

A. Koursaris·2 months ago

OpenAI· FRONTIER

Making ChatGPT better for clinicians

OpenAI offers free ChatGPT access to verified U.S. clinicians for care, documentation, and research use.

OpenAI·2 months ago

r/ClaudeAI· COMMUNITY

Swapped to 4.7 and embarrassed myself at work

User reports Opus 4.7 generated buggy test code without review, causing failed PR and workplace embarrassment.

u/BlakeR-·2 months ago·568 pts / 100 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

A weighted angle distance on strings

Multi-scale metric on strings using n-gram angle distances with exponential weights, proven metric properties and linear-time algorithm.

Grant Molnar·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Occupancy Reward Shaping uses optimal transport on world models to extract temporal geometry for credit assignment in offline goal-conditioned RL.

Aravind Venugopal·2 months ago

← Front Page30 stories

← Newer Older →