The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience

As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that contaminate academic literature and erode trust in science. We present PseudoBench, an adversarial benchmark for evaluating whether agentic auto-research systems can identify and resist pseudoscientific narratives. PseudoBench contains 200 curated pseudoscientific claim-evidence pairs across five domains and evaluates agents through an end-to-end research pipeline from...

Xinyang Liao·6 days ago

The Archive

PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience

When AI Says "I have been in similar situations": Synthetic Lived Experience in Peer-Like Caregiver Support

NVIDIA Blackwell Tops MLPerf Training 6.0 with Industry-Leading Scale and Performance

DOJ claims xAI’s unpermitted gas turbines are a matter of ‘national, economic, and energy security’

Plaud says its software business topped $100M in ARR after shipping over 2M AI notetakers

Fast Nonparametric Conditional Independence Testing via Two-Stage Regression

LLM Consumer Behavior Theory: Foundations of a Novel Research Field

C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift

Robinhood’s note on 10% layoffs shows blaming AI isn’t cutting it

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool Abstractions

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

Differential Privacy of Gaussian Process Posterior Sampling

Recover Semantics First, Generate Better: Improved Latent Modeling for 3D MRI Reconstruction and Cross-Contrast Synthesis

STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training

MoCo-AIS: A Contrastive Learning Framework for Similarity Computation of Vessel Trajectories

Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue

SegDINO: Introducing Multi-Scale Structure into DINO for Efficient Medical Image Segmentation

Learning task-specific subspaces via interventional post-training of speech foundation models

A Neuro-Symbolic Approach to Strategy Synthesis for Strategic Logics

Robustness of Similarity-based Positional Encoding Under Rotations: Theoretical Analysis and Experimental Validation

Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image Segmentation

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

Plug-and-Adapt: Multimodal Coreference Resolution at First Sight with a Pretrained Alignment Model

Small Initialization Matters for Large Language Models

SpaceX passes Amazon as valuation balloons to $2.7T

Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost Model

How Inference Compute Shapes Frontier LLM Evaluation

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks