The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Evaluation of Automatic Speech Recognition Using Generative Large Language Models

LLMs evaluated as semantic evaluators for ASR, achieving 92-94% human agreement compared to traditional WER metrics.

Thibault Bañeras-Roux·20 days ago

Fine-Tuning Regimes Define Distinct Continual Learning Problems

Analysis demonstrates fine-tuning parameter subspace choice fundamentally alters continual learning problem definition and outcomes.

Paul-Tiberiu Iordache·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

The Sample Complexity of Multicalibration

Theoretical result: multicalibration requires Θ(ε⁻³) sample complexity for achieving Expected Calibration Error bounds.

Natalie Collina·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

MathDuels: Evaluating LLMs as Problem Posers and Solvers

MathDuels benchmark uses self-play with LLMs as both problem authors and solvers to differentiate model capabilities beyond static benchmarks.

Zhiqiu Xu·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

HalluScope benchmark reveals prompt-induced hallucinations in LVLMs stem primarily from language dominance over vision grounding.

Pegah Khayatan·20 days ago

r/ClaudeAI· COMMUNITY

An update on recent Claude Code quality reports

Link: anthropic.com

u/caldazar24·20 days ago·27 pts / 5 comm·+ covered by others

arXiv (cs.AI/CL/LG)· ACADEMIA

From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

Agentic architecture automates conversion of research questions to reproducible scientific workflows via LLM semantic translation and domain expert Skills.

Bartosz Balis·20 days ago

r/LocalLLaMA· COMMUNITY

Are there actually people here that get real productivity out of models fitting in 32-64GB RAM, or is that just playing around with little genuine usefulness?

User asks whether local models (32-128GB RAM) deliver genuine productivity or remain hobbyist tools.

u/ceo_of_banana·20 days ago·54 pts / 82 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Low-Rank Adaptation Redux for Large Models

Survey frames LoRA through signal processing lens, unifying architectural choices and optimization techniques for parameter-efficient fine-tuning.

Bingcong Li·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models

Scale-adaptive diffusion framework enables joint spatiotemporal video super-resolution across variable upscaling factors and frame rates.

Max Defez·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

GiVA: Gradient-Informed Bases for Vector-Based Adaptation

GiVA uses gradient-informed initialization to improve vector-based adaptation efficiency, matching LoRA training times with extreme parameter efficiency.

Neeraj Gangwar·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Mapping the Political Discourse in the Brazilian Chamber of Deputies: A Multi-Faceted Computational Approach

Computational framework analyzes Brazilian parliamentary discourse using stylometric analysis and topic modeling on legislative speech.

Flávio Soriano·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Nemobot is an interactive environment for creating and deploying LLM-powered game agents across multiple game classes using Claude Shannon's taxonomy.

Chee Wei Tan·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

Study incorporates environmental and visual predictors into motor insurance claim-frequency models using zone-level geographic data.

Sherly Alfonso-Sánchez·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

Multi-stage deep learning framework with warm-start optimization addresses Unit Commitment problem for grid scheduling with renewable integration.

Muhy Eddin Za'ter·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

EVENT5Ws is a large manually-annotated open-domain event extraction dataset covering diverse event types for improved algorithm development.

Praval Sharma·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

TingIS is an enterprise-scale system for real-time risk event discovery from noisy customer incident data in cloud-native services.

Jun Wang·20 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

Multimodal text-graph approach leverages LLMs for open-domain event extraction without predefined event types using document-level context.

Praval Sharma·20 days ago

r/ClaudeAI· COMMUNITY

New type of limits - any ideas?

Claude introduces segmented rate limits: Claude Design separate quota, daily routine runs cap (0/15), and faster reset cycles; suggests potential billing/product changes.

u/BullionLog·20 days ago·20 pts / 18 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

RedirectQA dataset uses Wikipedia redirects to study LLM memorization of factual knowledge via entity surface form variation.

Yuto Nishida·21 days ago

arXiv (cs.AI/CL/LG)· ACADEMIA

Addressing Image Authenticity When Cameras Use Generative AI

Study examines image authenticity risks from generative AI hallucinations in camera image signal processors (ISPs) at capture-time.

Umar Masud·21 days ago

r/Anthropic· COMMUNITY

Anthropic Limits Reset

User observes API rate-limit reset time shifted from Thursday to Saturday; speculates new model launch.

u/gani_94·21 days ago·28 pts / 36 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Study evaluates whether LLMs encode relational context in moral decisions using Whistleblower's Dilemma across crime severity and social closeness dimensions.

Jiseon Kim·21 days ago

r/ClaudeAI· COMMUNITY

Claude reset limits for everyone

User reports Claude rate limits were reset.

u/just_a_person_27·21 days ago·770 pts / 263 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Locating acts of mechanistic reasoning in student team conversations with mechanistic machine learning

Machine learning model identifies mechanistic reasoning moments in student STEM team conversations via interpretable probability scoring.

Kaitlin Gili·21 days ago

r/ClaudeAI· COMMUNITY

Claude/AI is currently in the dialup phase: What's your opinion?

Reddit user speculates that AI latency will decrease over time, comparing current Claude usage to dial-up internet speeds.

u/Clean-Data-259·21 days ago·20 pts / 22 comm

r/singularity· COMMUNITY

Figure AI video suggests 03 production is ramping up

Figure AI shows progress on 03 humanoid robot production scaling and deployment.

u/Distinct-Question-16·21 days ago·139 pts / 33 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Replay-buffer engineering for noise-robust quantum circuit optimization

ReaPER+ replay buffer optimization addresses quantum circuit learning under hardware noise via annealed TD error prioritization.

Akash Kundu·21 days ago

r/LocalLLaMA· COMMUNITY

Why are we actually sampling reasoning and output the same way?

LLM sampling strategies for reasoning vs. output differ across languages; decoupling temperature affects both quality and determinism.

u/ReporterWeary9721·21 days ago·42 pts / 20 comm

arXiv (cs.AI/CL/LG)· ACADEMIA

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

Transient Turn Injection attack exploits stateless moderation in commercial and open-source LLMs via multi-turn adversarial distribution.

Naheed Rayhan·21 days ago

← Front Page30 stories

← Newer Older →