The Archive

Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.

Claude OpenAI Anthropic Gemini Mistral Cursor

Knowledge Reutilization in Meta-Reinforcement Learning

Meta-reinforcement learning enables fast adaptation by extracting shared structure from related tasks, but existing end-to-end methods often couple task inference with embodiment-specific control. This coupling can obscure non-parametric task semantics, reduce sample efficiency, and limit cross-agent reuse. We propose a meta-knowledge reutilization framework that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents. The framework uses a Bayesian non-parametric prior to organize latent task modes and a high-level policy to generate task-level magn...

Yuan Meng·6 days ago

The Archive

Knowledge Reutilization in Meta-Reinforcement Learning

Towards Understanding and Measuring COGNITIVE ATROPHY in LLM Behaviour

Unintended Effects of Geographic Conditioning in Large Language Models

Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines

Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping

First Proof Second Batch

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

datasette-tailscale 0.1a0

Leaked financial docs show OpenAI is losing billions of dollars a year

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

Querying an astronomical database using large language models: the ALeRCE text-to-SQL system

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

OmniPlan: An Adaptive Framework for Timely and Near-Optimal Network Planning Optimization

Quoting Georgi Gerganov

HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

IsabeLLM: Automated Theorem Proving Applied to Formally Verifying Consensus

How to Optimize Transformer-Based Models for Low-Precision Training

S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

Securing the future of AI agents

Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability

A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation

Tensor-based second-order causal discovery

Volterra Generative Models

Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare Applications

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

When LLMs Analyze Scars: From Images to Clinically-Meaningful Features

Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond