DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories
DialToM: Benchmark testing LLM Theory of Mind via dialogue trajectory forecasting from mental-state profiles, separating reasoning from correlation.
Search the full wire by company, model, lab, or keyword. Every story we have ever aggregated.
DialToM: Benchmark testing LLM Theory of Mind via dialogue trajectory forecasting from mental-state profiles, separating reasoning from correlation.
MedSkillAudit: Domain-specific audit framework for medical research agent skills assessing scientific integrity, reproducibility, and safety.
Chinese startup markets $3 AI device generating interactive holograms of deceased people from photos and voice.
Google DeepMind introduces Decoupled DiLoCo, a distributed training method improving resilience and efficiency across compute clusters.
Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey. But as AI becomes…
Commentary on Apple's John Ternus appointment and its implications for hardware-AI strategy, with tangential reference to SpaceX-Cursor partnership.
OpenAI releases workspace agents for ChatGPT to automate workflows and integrate enterprise tools with cloud-based execution.
OpenAI optimizes agentic workflows via WebSockets and connection-scoped caching in Responses API, reducing latency and API overhead.
Anthropic's Mythos AI model, a powerful cybersecurity tool that the company said could be dangerous in the wrong hands, has been accessed by a "small group of unauthorized users," Bloomberg reports. An unnamed member of the group, identified only as "a third-party contractor for Anthropic," told the publication that members of a private online forum got into Mythos via a mix of tactics, utilizing the contractor's access and "commonly used internet sleuthing tools." The Claude Mythos Preview is a new general-purpose model that's capable of identifying and exploiting vulnerabilities "in every m...
Mozilla used Claude Mythos Preview to identify 271 vulnerabilities in Firefox 150, demonstrating practical AI security tooling in production browsers.
GitHub Copilot tightens Individual plan limits, pauses signups, restricts Claude Opus 4.7 to $39/month Pro+ tier citing agentic workflow compute demands.
Autistic user testimonial about using Claude Co-work for organizing 20 years of personal creative systems and documents.
Anthropic briefly moved Claude Code from $20 Pro to $100+ Max tier, then reverted; pricing confusion around feature tiers.
OpenAI launches GPT-Image-2; Cursor secures $10B contract with xAI and $60B acquisition option.
OpenAI releases open-weight model for detecting and redacting PII in text with state-of-the-art accuracy.
Meta says that it has a new internal tool that is converting mouse movements and button clicks into data that can train its AI models.
Anthropic told TechCrunch it is investigating the claims, but maintains that there is no evidence that its systems have been impacted.
Only Elon would do this before an IPO.
With an IPO looming for Elon Musk's SpaceX / xAI / X combo platter of companies, SpaceX has announced an odd arrangement to either acquire the automated programming platform Cursor for $60 billion or pay a fee of $10 billion. Buying this startup that's focused on AI coding could help xAI's tools compete with market leader Anthropic, as well as the other competitors. A report by The Information this week said Sergey Brin has directed Google's "strike team" to help its agentic AI tools catch up, while Sam Altman reportedly declared a "code red" at OpenAI last year before shutting down Sora to f...
Was looking at buying the $20 Plan today after a demonstration from a friend (and wanting to switch/try my options from Codex), but saw that Claude Code was not included. I wanted to ask if this was a temporary change, or if the Pro plan truly never had Claude Code, and I was mistaken. My friend has a Max plan, so I could just be mistaken. Thanks! Edit: Link to site: [https://claude.com/pricing](https://claude.com/pricing)
Reddit discussion: researcher reports CVPR 2026 paper reproduces their June 2025 arXiv work with identical equations but insufficient citation; seeks guidance on plagiarism.
Anthropic's unreleased Mythos model reportedly accessed by unauthorized users, raising security and access control concerns.
https://github.com/BrainBlend-AI/tesseron Just open-sourced a protocol and TypeScript SDK I built mostly *with* Claude Code. The goal: let *Claude* (or any MCP client) drive a live application (browser tab, *Electron* / *Tauri* desktop app, Node daemon, CLI) by calling typed handlers inside your code, instead of scraping the UI with *Playwright* or *Computer Use*. It's called **Tesseron**. Ships as a Claude Code plugin, so install is one command: ``` /plugin marketplace add BrainBlend-AI/tesseron /plugin install tesseron@tesseron ``` Plugin spawns a small local MCP gateway automatically. ...
The proposed Pentagon drone investment rivals Ukraine’s entire military budget.
User reports Claude Code feature removed from Anthropic Pro plan pricing page.
CTO says new AI model is "every bit as capable" as world's best security researchers.
NeurIPS 2026 thread: researchers debate whether to submit code alongside papers given trade-offs between credibility and plagiarism risk.
IBM Granite 4.1-8B instruct model release; 8B long-context model with improved tool-calling and RL alignment.
When ChatGPT launched as an experimental prototype in late 2022, OpenAI’s chatbot became an everyday everything app for hundreds of millions of people. LLMs like ChatGPT were the new future: The entire tech industry was consumed by the inferno, with companies racing to spin up rival products. The ashes of the old tech world still…