DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding
DualFact multimodal framework separates factual verification in procedural video captioning into conceptual and contextual facts.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
DualFact multimodal framework separates factual verification in procedural video captioning into conceptual and contextual facts.
Analysis of Perspective API shutdown exposes structural dependence of NLP/LLM evaluation on single proprietary toxicity measurement tool.
Marco-MoE open-weight multilingual sparse MoE models with 5% parameter activation and best-in-class performance-to-compute ratio.
Dictionary learning method for Kernel EDMD approximation of nonlinear dynamical systems via Koopman operators.
RealMat-BaG benchmark for semiconductor bandgap prediction under experimental conditions using GNNs; addresses domain generalization challenges.
SnapGuard detects prompt injection attacks on screenshot-based web agents using lightweight multimodal methods instead of large VLMs.
Semantic Gateway framework applies formal validation and zero-trust security to LLM-orchestrated enterprise APIs using Model Context Protocol.
https://preview.redd.it/o43dx9b6cxxg1.png?width=796&format=png&auto=webp&s=64ea0822d0090847bf468e2efedc6179fb553e99
RL-based study of tactile and proximity sensor properties for humanoid collision avoidance on H1-2 robot using dodgeball task.
Qwen 3.6 27B quantization benchmark: BF16 (69.78% avg accuracy, 54GB RAM) vs Q4_K_M (66.54%, 15GB) vs Q8_0 across HumanEval, HellaSwag, BFCL.
Theoretical analysis of expressiveness tradeoffs between converging, output-converging, and halting variants of Recurrent Graph Neural Networks.
SignSGD improvements via small-batch convergence analysis and hybrid dithering strategy for 1-bit gradient compression.
PullMD MCP server extracts article content for Claude Code, reducing token waste on HTML parsing by filtering boilerplate.
Medoid prototype alignment framework for cross-plant ICS intrusion detection without labeled data for unseen attacks.
Neuro-symbolic PPO extension transfers logical policy specs from easier tasks to improve sample efficiency in sparse-reward RL.
Otter is also releasing a new Windows app that can capture meeting notes without joining one
Wall-clock analysis shows canonical logit/feature-based knowledge distillation outperforms recent methods for semantic segmentation.
Opinion piece: MBSE models used with LLMs function as prompts rather than knowledge bases; co-design methodology needed.
Cross-cultural survey of 4,641 users finds LLM emotional support adoption varies 20–59% across seven countries.
Automated adversarial collaboration framework using LLM agents and program synthesis to adjudicate competing cognitive science theories.
OpenAI terminates exclusive partnership with Microsoft, reshaping enterprise AI licensing and competitive dynamics in foundation model distribution.
Hybrid ML and Answer Set Programming framework for phishing detection with post-hoc belief revision layer.
Dyna-SAuR algorithm for safe reinforcement learning combining uncertainty-aware dynamics models with scalable safety filters.
Google has signed a classified deal that allows the US Department of Defense to use its AI models for "any lawful government purpose," The Information reports. The agreement was reported less than a day after Google employees demanded CEO Sundar Pichai block the Pentagon from using its AI amid concerns that it would be used in "inhumane or extremely harmful ways." If the agreement is confirmed, it would place Google alongside OpenAI and xAI, which have also made classified AI deals with the US government. Anthropic was also among that list until it was blacklisted by the Pentagon for refusing...
Empirical study showing LLMs lack reliable architectural reasoning for networked systems design despite architectural complexity.
EvoTSC genetic programming method for evolving lightweight feature learning models for time series classification.
SymphonyGen hierarchical framework for 3D orchestral music generation with cascading decoder architecture.
Last August, some of the best cybersecurity teams in the business gathered in Las Vegas to demonstrate the strength of their AI bug-finding systems at DARPA's Artificial Intelligence Cyber Challenge (AIxCC). The tools had scanned 54 million lines of actual software code that DARPA had injected with artificial flaws. The teams were capable enough to identify most of the artificial bugs, but their automated tools went beyond that - they found more than a dozen bugs that DARPA hadn't inserted at all. Even before the security earthquake that Anthropic delivered this month with Claude Mythos - the...
Deepseek Vision model announced or coming soon per Xiaokang Chen social media post.
Behavioral task sampling method improving zero-shot offline RL generalization via learned task vector distributions.