The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

Inference-time hallucination detection in SpeechLLMs via attention metrics (AUDIORATIO, AUDIOCONSISTENCY); avoids gold-standard requirement.

Jonas Waldendorf·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

EgoSelf: From Memory to Personalized Egocentric Assistant

EgoSelf system: graph-based interaction memory for personalized egocentric assistants; addresses long-term user context integration.

Yanshuo Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Structure-guided molecular design with contrastive 3D protein-ligand learning

SE(3)-equivariant transformer with contrastive learning for protein-ligand structure encoding in drug discovery.

Carles Navarro·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Detecting Data Contamination in Large Language Models

Membership inference attacks benchmark against SOTA LLMs to detect training data contamination via black-box methods.

Juliusz Janicki·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Separating Geometry from Probability in the Analysis of Generalization

Theoretical framework decoupling geometry from probability assumptions in classical generalization analysis.

Maxim Raginsky·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics

LSTM and attention-based models for heat stress prediction in construction workers using wearable sensor data.

Syed Sajid Ullah·2 months ago

r/Anthropic· COMMUNITY

Please don't take Opus 4.6 and Extended thinking away. 4.7 is absolutely useless.

User reports Claude Opus 4.7 performs worse than 4.6 on file search and factual accuracy tasks.

u/fairyflossmagpie·2 months ago·223 pts / 62 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Multi-agent LLM framework addresses actor-observer cognitive bias via dialectical alignment in self-reflection loops.

Bobo Li·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Emotion-Cause Pair Extraction in Conversations via Semantic Decoupling and Graph Alignment

Graph-based semantic decoupling for emotion-cause pair extraction in dialogue conversations.

Tianxiang Ma·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling

Debiased preference curation and iterative training for multimodal reward models aligning MLLMs with human preferences.

Zhihong Zhang·2 months ago

TechCrunch AI· PRESS

YouTube expands its AI likeness detection technology to celebrities

YouTube is expanding its AI likeness detection tool to celebrities, giving talent and their reps a way to find and remove deepfakes.

Sarah Perez·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems

Mesh Memory Protocol enables cross-session cognitive collaboration and state sharing for multi-agent LLM teams.

Hongwei Xu·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Integrating Anomaly Detection into Agentic AI for Proactive Risk Management in Human Activity

Agentic AI with anomaly detection for proactive fall risk management and elderly activity monitoring.

Farbod Zorriassatine·2 months ago

Google DeepMind· FRONTIER

Partnering with industry leaders to accelerate AI transformation

Google DeepMind announces partnerships with global consultancies to commercialize frontier AI for enterprise adoption.

Google DeepMind·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Cyber Defense Benchmark evaluates LLM-agent threat hunting on 106 real Windows attack scenarios with MITRE ATT&CK coverage.

Alankrit Chona·2 months ago

r/ClaudeAI· COMMUNITY

tested 9 models with and without agent skills. Haiku 4.5 with a skill beat baseline Opus 4.7.

Benchmark: Haiku 4.5 with agent skill (84.3%) outperforms baseline Opus 4.7 (80.5%) across 880 evals at 5x lower cost.

u/jorkim_32·2 months ago·97 pts / 37 comm

Google AI (Gemma)· FRONTIER

3 new ways Ads Advisor is making Google Ads safer and faster

Google Ads Advisor adds three agentic safety features for policy compliance and account optimization.

{"$":{"xmlns:author":"http://www.w3.org/2005/Atom"},"name":["Priya Baliga"],"title":["Senior Engineering Director"],"department":["Ad Platforms"],"company":[""]}·2 months ago

r/ClaudeAI· COMMUNITY

Finally no more [Pasted text #1 +23 lines] - now you can see what you pasted fully

Claude Code UX improvement: double-paste expands truncated pasted content, fixing long-standing issue.

u/somerussianbear·2 months ago·93 pts / 11 comm

r/OpenAI· COMMUNITY

OpenAI is teasing the Image V2 model.

OpenAI teases Image V2 model ahead of public release.

u/lil_curry_verse·2 months ago·89 pts / 30 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

EVPO addresses critic noise in sparse-reward LLM RL by casting baseline selection as Kalman filtering, challenging PPO/GRPO design tradeoffs.

Chengjun Pan·2 months ago

r/singularity· COMMUNITY

OpenAI teases gpt-image 2? Livestream at 12pm PT

GPT-Image-2 implements self-review iteration mechanism; ~11 min generation time for multi-pass refinement.

u/d1ez3·2 months ago·111 pts / 24 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean

Dual-Glob applies supervised contrastive learning to pitch accent classification in Seoul Korean speech; narrow linguistic domain.

Hyunjung Joo·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Fairness Audits of Institutional Risk Models in Deployed ML Pipelines

Replica-based fairness audit of deployed Early Warning System at Centennial College reveals gender/age/residency disparities across ML pipeline.

Kelly McConvey·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A neural operator framework for data-driven discovery of stability and receptivity in physical systems

Neural operator framework discovers stability and receptivity in complex systems from data alone, without governing equations.

Chengyun Wang·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues

LePREC dataset (769 Malaysian Contract Act cases) reveals LLM limitations in legal issue identification despite diverse generation capability.

Fanyu Wang·2 months ago

r/Anthropic· COMMUNITY

in case you don't know why Claude models keep getting worse after the 4.7 release: Anthropic lets OpenClaw be used again. the model changes mainly to make it able handle this traffics with lowest costs at the expense of code users.

Reddit speculation that Claude 4.7 performance degradation stems from cost optimization and OpenClaw re-enablement; unverified claim.

u/Aggravating_Bad4639·2 months ago·10 pts / 14 comm

The Verge AI· PRESS

John Ternus’s first big problem is AI

Less than a year ago, Apple made headlines for a lack of AI announcements at its annual WWDC event. Ten months later, the company has announced that hardware executive John Ternus will succeed longtime CEO Tim Cook as chief executive - and the official release doesn't mention AI once. Ternus, currently Apple's SVP of hardware engineering, will take over as CEO on September 1st, after Cook's decade and a half in the role. Ternus is a 25-year veteran of the company and the first Apple CEO in about 30 years to come from the hardware sector. According to Apple, he's led hardware engineering work ...

Hayden Field·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning

GPT-5 and DeepSeek-R1 exploit formalization-faithfulness gap in Lean 4 proofs despite valid logical reasoning; evaluates on FOLIO and Multi-LogiEval.

Kyuhee Kim·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

Four-axis alignment framework (factual, reasoning, compliance, regulatory) for evaluating long-horizon enterprise AI agents in loan/claims/clinical domains.

Vasundra Srininvasan·2 months ago

arXiv (cs.AI/CL/LG)· ACADEMIA

ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications

ZC-Swish activation stabilizes BN-free deep networks in micro-batch and federated regimes by centering activations to prevent mean-shift.

Suvinava Basak·2 months ago

← Front Page30 stories

← Newer Older →