The Brief: Is The Eureka Moment Obsolete?

Also: Andrej Karpathy joins Anthropic, OpenAI’s reasoning model settles an 80-year-old Erdős conjecture, and the US gov takes equity in nine quantum companies

May 24, 2026

FIELD NOTES

It seems like frontier research is being disrupted as rapidly as SaaS startups.

This week, OpenAI announced that a general reasoning model disproved an 80-year-old conjecture in geometry. Field Medalist Tim Gowers, in a companion paper, described the result as “a milestone in AI mathematics. In the same week, two papers landed in Nature on multi-agent science systems. DeepMind's AI co-scientist, built on Gemini, generated and ranked hypotheses for drug repurposing in acute myeloid leukemia that then held up in the lab. And FutureHouse's Robin proposed a drug-disease link no one had drawn: ripasudil, a glaucoma drug, as a candidate treatment for the leading cause of blindness in the developed world.

This is a timeline that belongs more to product launches, not frontier breakthroughs.

At first I wondered if the model’s breakthroughs were due to brute force, or a ton of human hand-holding. But it’s neither. The chain of thought is a search that diagnoses its own failures and lets each one point to the next move. That’s domain-specific judgment about where to look, the thing we usually call intuition.

This heuristic search parallels how the scientific community works, only slower and spread across people.. grad students, arXiv, citations: these are a giant decentralized version of exactly this loop, propose, fail, diagnose, redirect, propagate.

Except, that the model ran the whole thing in days, rather than over decades.

Reading through how the model steers itself made me wonder:

Language is thinking, and thinking is language.

It’s tempting to call chain-of-thought narration, the machine describing its thinking out loud. I almost think it’s less narration but actually the thinking itself. The language isn’t describing the search. The language is the search.

It turns out that language parroting, a fluent recombination of everything humans have written, is a shockingly good engine for thinking.

So if machines are about to outrun the human method for discovery, do we have to do what SaaS startups have forced their incumbents and now themselves to do, and flip our mental models?

At the (wonderful) SAIR summit last week, Gowers gave a keynote about what he calls motivated proofs. The idea is that AI shouldn’t just produce correct proofs, it should produce transparent ones, built from explicit, inspectable moves, the kind a good teacher uses. The goal is to for AI to create proofs we can learn from, not just verify. Make it human-legible.

Is there where the human’s role in search is heading? To direct, but then to reverse engineer discovery?

To take the machine’s correct, alien output and reverse-engineer the understanding out of it. Find the insight. Extract the technique. Turn the answer into something that can carry to the next problem.

Maybe the eureka moment is obsolete.

Or at least how we traditionally romanticized it as the moment of discovery. Maybe in the near future it lives more at the moment of understanding or distilling what was discovered.

This week, it’s very much felt like something has broken through in the frontiers of research.

Discovery is very much psychological. The frontier isn’t always about an insurmountable wall a wall of difficulty. It’s more often a wall of imagination, a seemingly impossible task that turns obvious the instant someone shows you the trick.

A friend brought up the fable of Columbus and the egg. At a dinner, Columbus hands guests a boiled egg and asks them to stand it on its end. They try, they fail, they give up. Then he taps it on the table, flattening one end just enough, and balances it successfully.

The solution is obvious, once you’ve seen it done. The barrier was never the difficulty. It was the inability to see it could be done that way at all. To many, maybe especially the people closest to the craft will be the ones most tempted to say that doesn’t count. That it isn’t real discovery unless it arrives the way ours always has.

Regardless, from here onwards, the machine will keep tapping the egg on the table.

Thank you to T, M, S, B, J, and J from bouncing thoughts with me.

Have a great memorial day weekend.

Tara

THE DOWNLOAD

The biggest news to pay attention to this week

Andrej Karpathy joins Anthropic to work on pre-training

Andrej Karpathy, an OpenAI co-founder and former Tesla AI lead, announced on May 19 that he has joined Anthropic. He is reporting to pre-training lead Nick Joseph and building a team focused on using Claude to accelerate pre-training research, the stage where models acquire their core capabilities.

Why it matters: Putting one of the field's best-known researchers on AI-assisted pre-training is a bet that the next gains come from models improving how models are trained, rather than from raw compute alone. If it works, the edge shifts to whoever gets models improving their own training, and spending alone stops being enough to stay at the frontier.

Alibaba’s Qwen3.7-Max optimized a kernel for Alibaba’s own chip across a 35-hour autonomous run

Alibaba released Qwen3.7-Max, a model built for long-running agent work, and used it to write a kernel (a small program that runs one operation e.g. a transformer’s attention step as efficiently as possible on a given chip) for Alibaba’s own T-Head ZW-M890 accelerator. When pointed at a chip it had not seen in training, the model ran on its own for roughly 35 hours, compiling, testing, and rewriting the code across about 1,158 tool calls and 432 kernel evaluations. Alibaba reported the result ran roughly 10x faster than the standard open-source version it started from, though these are the company’s own figures and have not been independently reproduced.

Why it matters: A chip runs AI workloads only as fast as the kernels written for it, and that software layer takes scarce compiler engineers years to build for each new chip. This is a large part of the reason why Nvidia’s CUDA ecosystem is harder to replicate than the hardware, and why competing accelerators tend to fall short on software rather than specs. A model that can write a competitive kernel for an unfamiliar chip points to a cheaper path to making new silicon usable.

OpenAI’s reasoning model disproved an Erdős conjecture from 1946

OpenAI reported that a general-purpose reasoning model produced an original proof disproving the planar unit distance conjecture, a question Paul Erdős posed in 1946, by finding a new family of constructions that beats the long-assumed square-grid optimum. The result was reviewed and endorsed by mathematicians Noga Alon, Melanie Wood, and Thomas Bloom, and Fields Medalist Tim Gowers said he would recommend it. Notably, this is the same group that called out OpenAI’s overstated Erdős claim seven months ago, which Bloom had described as a dramatic misrepresentation.

Why it matters: The proof came from a general reasoning model, not a system built for mathematics, which undercuts the thesis that frontier scientific work needs purpose-built, narrowly trained tools. If general models can reach original results in the hardest reasoning domains, the case for funding deep, vertical AI-for-science solutions weakens against say, scaling compute on the frontier labs' general models.

Nature publishes DeepMind and FutureHouse AI research agents

Nature published peer-reviewed work on two multi-agent research systems, DeepMind’s Co-Scientist and FutureHouse’s Robin. Robin identified ripasudil as a repurposing candidate for dry age-related macular degeneration, and Co-Scientist surfaced repurposable leukemia drugs and liver fibrosis targets, all still requiring preclinical validation. Edison Scientific, the commercial spinout of FutureHouse, also announced a partnership with Incyte to embed its Kosmos platform across the drug discovery and development lifecycle.

Why it matters: The gap between an academic demonstration and a commercial deployment is rapidly shortening. A method published in Nature this week is already being sold into a public pharma company's R&D stack, and I predict we will soon see frontier research methods looped into commercial R&D almost as fast as they publish.

Google centered I/O 2026 on agents, spanning consumer, developer, and science tools

At I/O 2026, Google shipped Gemini 3.5 Flash to general availability the day it was announced, making it the default worker model across Search, the Gemini app, and enterprise products, with Pro held until June. It launched Antigravity 2.0, its agent-first development platform, where it demoed agents building a working operating system in 12 hours, plus Gemini Spark, a 24/7 personal agent on cloud VMs, and Gemini Omni for video and world simulation. It also introduced Gemini for Science, a research suite, and open-sourced Science Skills, a bundle connecting agentic platforms to more than 30 life science databases including UniProt and the AlphaFold Database.

Why it matters: Google is leaning hard on the one advantage rivals can't quickly copy: distribution. By shipping the fast, cheap Flash tier first and holding Pro back, it seems like they are banking on speed, reach, and reliable environments where agents actually run in. With 1 billion AI Mode users, 900 million on the Gemini app, and 3 billion Android devices, this give Google insane reach and feedback data that no standalone startup or model can match yet.

The US Commerce Department is taking equity in nine quantum companies for $2 billion

The Commerce Department signed letters of intent for $2.013 billion in CHIPS Act incentives across nine quantum companies, taking a minority equity stake in each as a condition of the award. IBM receives $1 billion to launch Anderon, a standalone 300mm quantum wafer foundry it is matching dollar-for-dollar; GlobalFoundries receives $375 million; and D-Wave, Rigetti, Infleqtion, PsiQuantum, Quantinuum, and Atom Computing receive roughly $100 million each across every major hardware modality.

Why it matters: The US government is reusing the same ownership-plus-support template it applied to rare earths to quantum, calling it a “critical frontier technology”. The stake buys supply-chain security and influence in the domestic quantum ecosystem, which they believe has “significant implications for national defense, advanced materials and biopharmaceutical discovery, financial modeling, and energy systems”.

Discussion about this post

Ready for more?