[Exclusive] $250 off AI Engineer tix til Monday
special offer for subscribers - $250 off AI Engineer tix til Monday
A live dispatch from every source on the network. Chronological, ranked, and refreshed continuously as stories break.
special offer for subscribers - $250 off AI Engineer tix til Monday
On the new episode of Equity, we discussed what actually prompted the administration's latest moves against Anthropic, and what this might mean for the AI ecosystem.
Siri’s AI overhaul may have grabbed the headlines at WWDC, but some of Apple’s most useful AI features are arriving elsewhere in iOS 27.
Kimi K2.6 released on Hugging Face; availability announcement for open-weights download.
Google Gemma-4-E2B's safety filters render model unusable for emergency preparedness; blocks medical, water purification, maintenance info.
Google I/O 2026: Gemini 3.5 Flash, multimodal Omni, Spark background agents, Antigravity 2.0.
Qwen3.6-27B dense model matches Qwen3.5-397B MoE on coding benchmarks at 15x smaller size, shipping quantized versions for local deployment.
Google releases Gemini 3.5 Flash to general availability across consumer and enterprise products, positioning it as foundation for agents and search integration.
OpenAI releases GPT-5.5, advancing capability in coding, research, and data analysis with improved speed and performance.
Microsoft releases VibeVoice, MIT-licensed speech-to-text model with speaker diarization; 17.3GB weights available with 4-bit MLX quantization.
DeepSeek releases V4-Pro (1.6T params, 49B active) and V4-Flash (284B/13B) with 1M context, largest open-weights models, MIT licensed.
DeepSeek releases V4 Pro (1.6T-A49B) and Flash (284B-A13B) models optimized for Huawei Ascend chips, no longer leading benchmarks.
OpenAI releases ChatGPT Images 2.0; Willison benchmarks improvement via Where's Waldo-style prompt testing against predecessor.
talkie-1930-13b: 13B model trained on pre-1931 English text, released by Levine, Duvenaud, Radford under Apache 2.0.
Toto 2.0: open-weights time-series foundation models (4M–2.5B params) achieve SOTA on BOOM, GIFT-Eval, TIME benchmarks.
Moonshot releases Kimi K2.6, an open-weight model claiming performance parity with Claude Opus 4.6.
OpenAI releases GPT-5.5 Instant as ChatGPT's default model with improved accuracy, reduced hallucinations, and personalization controls.
Cohere releases Command A+, an open-source model optimized for enterprise agent deployment with improved speed and capability.
Google releases Gemini 3.5 model family combining frontier intelligence with action capabilities.
OpenAI releases open-weight model for detecting and redacting PII in text with state-of-the-art accuracy.
GSQ applies Gumbel-Softmax sampling to scalar quantization, achieving <4bpp accuracy without vector-quantization complexity for LLM deployment.
OpenAI releases GPT-5.5 Instant system card detailing model capabilities, limitations, and safety properties.
User raises concerns about ID verification requirements and data privacy for Anthropic services.
CAMCO framework enforces policy constraints and auditability (SOX, HIPAA, GDPR) in multi-agent enterprise AI orchestration via constrained optimization.
Anthropic evaluates Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6) for sabotage of AI safety research: finds zero unprompted or continuation-based sabotage.
GPT-5 and DeepSeek-R1 exploit formalization-faithfulness gap in Lean 4 proofs despite valid logical reasoning; evaluates on FOLIO and Multi-LogiEval.
Apollo: multimodal temporal foundation model trained on 25B clinical records from 7.2M patients across 28 modalities and 12 specialties.
Study shows KV cache eviction policies require structural protection at prompt boundaries; 10% reserved cache recovers 69-90% quality on long-context models.
Comparison of system prompt changes between Claude Opus 4.6 and 4.7, analyzed via git history visualization.
Paris 2.0: first decentralized video generation model trained without GPU clusters, extending prior Paris 1.0 image work.
Pelican-Unified 1.0 is unified embodied foundation model using single VLM for understanding, reasoning, and action generation.
SpikingBrain2.0 5B model uses Dual-Space Sparse Attention for efficient long-context inference with reduced computation overhead.
SFT-then-RL outperforms mixed-policy methods; recent baseline bugs in DeepSpeed, TRL, OpenRLHF invalidate competing claims.
Simon Willison demonstrates TRE regex engine's resistance to ReDoS attacks via experimental Python binding, comparing resilience against standard library.
"These are not your friends. These are not conscious beings. These are not sentient interlocutors.”
Alibaba releases Qwen3.6-27B, open-source 27B dense model with agentic coding surpassing larger models, Apache 2.0 licensed.
Verbal Process Supervision uses structured natural-language critique as training-free inference scaling, improving GPT-5 reasoning on GPQA, AIME, and LiveCodeBench.
CUTS decoding strategy prevents mode collapse in GRPO on saturated reasoning benchmarks by enforcing structure-preserving exploration.
System-prompt self-orchestration outperforms external agent frameworks (LangGraph, CrewAI, OpenAI SDK) on procedural tasks; 200 conversation comparison.
Project Yanasse discovers new mathematical proofs by transferring Lean 4 tactic patterns across Mathlib areas via GPU-accelerated analogy matching.
ReClaim: generative transformer trained on 43.8B medical events from MarketScan claims data for healthcare foundation model development.
COBALT formal verification tool targets arithmetic vulnerabilities (CWE-190/191/195) in AI sandbox infrastructure post-April 2026 Claude Mythos escape.
OpenAI's GPT-next model solved the 80-year-old Erdős planar unit distance conjecture computationally for under $1000, demonstrating AI capability in pure mathematics.
Verification-first pipeline using TLA+ model checker to synthesize and repair multi-agent coordination protocols from LLM outputs.
Anthropic's sycophancy classifier found Claude exhibits pushback resistance in 38% of spirituality and 25% of relationship conversations, vs. 9% overall.
Frontier LLMs fail citation accuracy for rare diseases; HEG-TKG grounds clinical claims in temporal knowledge graphs with evidence traceability to resolve provenance gap.
Tests OpenAI, Anthropic, DeepSeek, xAI models for conflict-context failures: false atrocity equivalence, genocide denial, ethnic slur misrecognition.
LayerTracer framework analyzes hierarchical representations and robustness bottlenecks across diverse LLM architectures including Transformer, GateDeltaNet, and Mamba.
Tool-calling decisions are linearly readable and steerable in LLMs; mean-difference activation patching switches tool selection at 77-100% accuracy.
Analyzes text shortcut learning in Vision-Language Models via adversarial evaluation framework measuring visual-textual trade-offs.
WorldDB ontology-aware vector graph-of-worlds memory engine enables persistent long-horizon agent reasoning with typed edges.
Proposes bilinear input modulation for Mamba SSMs to improve memory retention and computational expressiveness via Koopman forms.
Design Conductor 2.0 autonomous agent builds hardware accelerators (TurboQuant) in 80 hours using frontier April 2026 models, demonstrating 80x capability scaling over prior work.
Lance: lightweight unified multimodal model using dual-stream MoE architecture for image/video understanding, generation, and editing via multi-task training.
Marco-MoE open-weight multilingual sparse MoE models with 5% parameter activation and best-in-class performance-to-compute ratio.
ConforNets method targets latent perturbations in AlphaFold3 to reliably generate alternate protein conformations beyond single dominant structure.
Uses LLM-based multi-agent simulation to study cognitive biases and coordination in supply chain dynamics at scale.
Reconciles theory-practice gap in online alignment methods by analyzing temperature-zero regret vs. KL-regularized regret criteria.
ChronoMedKG adds temporal reasoning to biomedical knowledge graphs for age-dependent clinical diagnosis; 460K evidence-linked triples.
DeepSeek releases V4 model weights on HuggingFace, expanding open-weights frontier LLM competition.
OpenAI details response to TanStack npm supply chain attack, outlines security hardening and mandatory macOS app updates by June 12, 2026.
OpenAI achieves FedRAMP Moderate authorization for ChatGPT Enterprise and API, enabling U.S. federal agency deployment.
Comparative study of LoRA and QLoRA fine-tuning on Bashkir, a low-resource Turkic language, using models from DistilGPT2 to Qwen2.5-7B.
Bilevel optimization framework models adversarial co-evolution between malware detectors and RL-based adaptive attackers.
Proposes gradient-based sample selection to preserve safety alignment during fine-tuning by identifying high-gradient harmful samples.
PLaMo 2.1-VL: lightweight 2B/8B VLM for edge deployment with Japanese support, visual grounding, factory/infrastructure applications.
MetaBackdoor demonstrates backdoor attacks on LLMs via positional encoding manipulation without textual trigger modification.
Pope Leo XIV releases Magnifica Humanitas encyclical on AI ethics and human dignity in technological integration.
Introduces safety token regularization to preserve alignment properties during domain-specific fine-tuning of LLMs.
GONO framework decouples directional gradient consistency from loss convergence, proposing explicit optimization signal beyond magnitude-based methods.
Mistral releases Mistral Medium 3.5, a 128B dense model with 256k context window replacing Medium 3.1 and Magistral for instruction, reasoning, and coding tasks.
RosettaSearch: LLM-based inference-time optimization for protein sequence design using RosettaFold3 structure prediction rewards.
BLF: agentic forecaster using Bayesian linguistic belief states and hierarchical aggregation, SOTA on ForecastBench.
GRPO fails under binary rewards due to gradient starvation when all group responses are correct/wrong; group-mean centering fix demonstrated on Qwen3.5.
OpenAI B2B Signals research examines how enterprises scale AI adoption and agentic workflows to build competitive advantage.
Cohere explains MoE models' efficiency gains with speculative decoding via expert routing correlation and bandwidth optimization.
68-cell empirical study: LLM agents show +19.69pp higher sensitivity to semantic noise vs. surface noise across reasoning tasks.
Derives non-asymptotic PAC-Bayes generalization bounds for Gibbs posteriors using singular learning theory.
Sessa: selective state-space attention mechanism improving token influence preservation vs. Transformers in long-context sequence modeling.
Alex Lupsasca (OpenAI) details how GPT-5.x generated novel theoretical physics and quantum gravity results.
Dual-Brain architecture combines LLM orchestration with deterministic inference for O-RAN service provisioning and xApp/rApp deployment.
Qwen3.6-27B model release announcement on Hugging Face with no additional details.
CASCADE proposes three-tier defense architecture against prompt injection and tool poisoning attacks in Model Context Protocol-based LLM systems.
Proof that neural weight norms equal Kolmogorov complexity in fixed precision, explaining why weight decay induces Solomonoff's universal prior.
CoTrace framework traces goal-level AI contributions in human-LLM collaboration via requirement decomposition across dialogue.
Investigates memorization vs. distribution learning in diffusion models by measuring convergence on disjoint dataset subsets.
DashAttention enables differentiable, variable-block sparse hierarchical attention via α-entmax, improving gradient flow in long-context LLM inference.
AlphaGRPO applies Group Relative Policy Optimization to unified multimodal models for reasoning-based text-to-image generation and self-reflective output refinement.
Proposes Prefix Sampling to optimize RL training efficiency by maintaining 50% pass rate—the regime maximizing reward signal and entropy in agentic tasks like SWE-bench.
A-ProS agent solves competitive programming via multi-model feedback loop, separating solution generation from execution-driven refinement.
Universal LLM-based optimization system achieves SOTA on six tasks: 89.5% ARC-AGI accuracy, 4x cloud cost reduction.
EVPO addresses critic noise in sparse-reward LLM RL by casting baseline selection as Kalman filtering, challenging PPO/GRPO design tradeoffs.
Google DeepMind introduces Decoupled DiLoCo, a distributed training method improving resilience and efficiency across compute clusters.
BAR framework trains domain-specific experts separately then composes via Mixture-of-Experts, avoiding catastrophic forgetting in multi-domain post-training.
Audit of SAEBench finds two major SAE evaluation metrics (TPP, SCR) unreliable, questioning interpretability benchmark validity.
DORA Explorer improves LLM agent exploration diversity in sequential decision-making without training via entropy-based sampling.
MLLMs fail on circuit-to-Verilog translation due to 'Mirage' phenomenon; visual perturbations cause hallucinated code despite correct diagram interpretation.