Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps
Inference-time hallucination detection in SpeechLLMs via attention metrics (AUDIORATIO, AUDIOCONSISTENCY); avoids gold-standard requirement.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
Inference-time hallucination detection in SpeechLLMs via attention metrics (AUDIORATIO, AUDIOCONSISTENCY); avoids gold-standard requirement.
EgoSelf system: graph-based interaction memory for personalized egocentric assistants; addresses long-term user context integration.
SE(3)-equivariant transformer with contrastive learning for protein-ligand structure encoding in drug discovery.
Membership inference attacks benchmark against SOTA LLMs to detect training data contamination via black-box methods.
Theoretical framework decoupling geometry from probability assumptions in classical generalization analysis.
LSTM and attention-based models for heat stress prediction in construction workers using wearable sensor data.
User reports Claude Opus 4.7 performs worse than 4.6 on file search and factual accuracy tasks.
Multi-agent LLM framework addresses actor-observer cognitive bias via dialectical alignment in self-reflection loops.
Graph-based semantic decoupling for emotion-cause pair extraction in dialogue conversations.
Debiased preference curation and iterative training for multimodal reward models aligning MLLMs with human preferences.
YouTube is expanding its AI likeness detection tool to celebrities, giving talent and their reps a way to find and remove deepfakes.
Mesh Memory Protocol enables cross-session cognitive collaboration and state sharing for multi-agent LLM teams.
Agentic AI with anomaly detection for proactive fall risk management and elderly activity monitoring.
Google DeepMind announces partnerships with global consultancies to commercialize frontier AI for enterprise adoption.
Cyber Defense Benchmark evaluates LLM-agent threat hunting on 106 real Windows attack scenarios with MITRE ATT&CK coverage.
Benchmark: Haiku 4.5 with agent skill (84.3%) outperforms baseline Opus 4.7 (80.5%) across 880 evals at 5x lower cost.
Google Ads Advisor adds three agentic safety features for policy compliance and account optimization.
Claude Code UX improvement: double-paste expands truncated pasted content, fixing long-standing issue.
EVPO addresses critic noise in sparse-reward LLM RL by casting baseline selection as Kalman filtering, challenging PPO/GRPO design tradeoffs.
GPT-Image-2 implements self-review iteration mechanism; ~11 min generation time for multi-pass refinement.
Dual-Glob applies supervised contrastive learning to pitch accent classification in Seoul Korean speech; narrow linguistic domain.
Replica-based fairness audit of deployed Early Warning System at Centennial College reveals gender/age/residency disparities across ML pipeline.
Neural operator framework discovers stability and receptivity in complex systems from data alone, without governing equations.
LePREC dataset (769 Malaysian Contract Act cases) reveals LLM limitations in legal issue identification despite diverse generation capability.
Reddit speculation that Claude 4.7 performance degradation stems from cost optimization and OpenClaw re-enablement; unverified claim.
Less than a year ago, Apple made headlines for a lack of AI announcements at its annual WWDC event. Ten months later, the company has announced that hardware executive John Ternus will succeed longtime CEO Tim Cook as chief executive - and the official release doesn't mention AI once. Ternus, currently Apple's SVP of hardware engineering, will take over as CEO on September 1st, after Cook's decade and a half in the role. Ternus is a 25-year veteran of the company and the first Apple CEO in about 30 years to come from the hardware sector. According to Apple, he's led hardware engineering work ...
GPT-5 and DeepSeek-R1 exploit formalization-faithfulness gap in Lean 4 proofs despite valid logical reasoning; evaluates on FOLIO and Multi-LogiEval.
Four-axis alignment framework (factual, reasoning, compliance, regulatory) for evaluating long-horizon enterprise AI agents in loan/claims/clinical domains.
ZC-Swish activation stabilizes BN-free deep networks in micro-batch and federated regimes by centering activations to prevent mean-shift.