Getting models to learn better ≠ knowing more
Also, are diffusion models coming back? Cursor hits $500M ARR and raises $900M more; the first neuron-based chip hits the market and hint: you have to keep it alive
We’ve all known a student who crams for an exam. They can recall facts and figures under pressure but falter when asked to apply that knowledge to a new problem. They’ve memorized, but haven’t truly learned.
It turns out our most advanced AI models often begin their learning journey in a strikingly similar way.
I went down a bit of a rabbit hole after recently reading the paper “How much do language models memorize?”, which sought to quantify the tipping point (what researchers call “grokking”) where a model’s behavior shifts from memorization to generalization (ie, finding underlying patterns, or what we might call actual understanding). In a nutshell, when a model has no more room to simply store examples, is it then forced to learn in a more intelligent way.
So, bigger isn’t always better. Sometimes, a smaller model trained on a more focused dataset reaches that grokking phase earlier and more efficiently.
That’s what FutureHouse, a non-profit science lab funded by Eric Schmidt, found in their pursuit of a reasoning model purpose-built for chemistry, which they call ether0.
They found that while massive AI models “know a lot about chemistry,” they were “bad at actually working with molecules.”
“They struggle to count the number of oxygens in a molecule, propose implausible ring structures, and are not good at naming molecules.
We reasoned that, because modern LLMs have so much latent knowledge stored in their weights about how chemistry works, a small amount of reinforcement learning might be able to rapidly boost their performance on tasks that involve working with molecules.”
This points to a broader trend: towards a new paradigm of specialized, efficient reasoning… not by knowing more, but by learning better.
And if this specialized approach is proving successful in a complex domain like chemistry, it is reasonable to expect we will see similar specialized ventures in other domains like math.
Like this stealth math-solving model company, Axiom, founded by Stanford Math PhD Carina Hong, which is said to be raising at a $300M valuation right off the bat.
We’re seeing glimmers of evolution happening on the architectural front too. It seems like diffusion models are now making a comeback, and being adapted for language and code.
These diffusion-based language models (that still use a transformer backbone) can process tasks holistically, which can lead to massive speedups and more coherent output. Early peeks at Google’s Gemini Diffusion models point at a very near future where the code for a chat app can be generated in single-digit seconds.
“Video is in real time” - @schirano
So far, Mercury is the first commercially available diffusion LLM (released by Inception Labs). But I would be surprised if it remains the only one in the market for much longer.
Have a great weekend - ttyl
AI IDE Cursor raised $900M in a new round that values it at $9.9B. It’s now touted as the fastest growing startup ever, as it surpasses $500M in ARR in just three years.
ChatGPT introduces meeting recording and connectors for Google Drive, Box, and more: OpenAI’s ChatGPT is adding new features for business users, including integrations with different cloud services, meeting recordings, and MCP connection support for connecting to tools for deep research. This allows ChatGPT to look for information across users’ own services to answer their questions, alongside features like recording and transcription of meetings.
Mistral AI’s new coding assistant takes direct aim at GitHub Copilot: The new product, called Mistral Code, bundles the company’s latest AI models with integrated development environment plugins and on-premise deployment options specifically designed for large enterprises with strict security requirements. Unlike typical software-as-a-service coding tools, Mistral Code allows companies to deploy the entire AI stack within their own infrastructure, ensuring that proprietary code never leaves corporate servers.
Hugging Face says its new robotics model is so efficient it can run on a MacBook: AI dev platform Hugging Face released an open AI model for robotics called SmolVLA, now a part of Hugging Face’s rapidly expanding effort to establish an ecosystem of low-cost robotics hardware and software. Hugging Face claims that SmolVLA is small enough to run on a single consumer GPU – or even a MacBook – and can be tested and deployed on “affordable” hardware, including the company’s own robotics systems.
Google quietly launches AI Edge Gallery, letting Android phones run AI without the cloud: Google has quietly released an experimental Android application that enables users to run sophisticated AI models directly on their smartphones without requiring an internet connection. Called AI Edge Gallery the app, allows users to download and execute AI models from the popular Hugging Face platform entirely on their devices, enabling tasks such as image analysis, text generation, coding assistance, and multi-turn conversations while keeping all data processing local.
Human brain cells on a chip for sale – World-first biocomputing platform hits the market: In a development straight out of science fiction, Australian startup Cortical Labs has released what it calls the world’s first code-deployable biological computer. The CL1, which debuted in March, fuses human brain cells on a silicon chip to process information via sub-millisecond electrical feedback loops. Each CL1 contains 800,000 lab-grown human neurons, reprogrammed from the skin or blood samples of real adult donors. The cells remain viable for up to six months, fed by a life-support system that supplies nutrients, controls temperature, filters waste, and maintains fluid balance.
Figma introduces Dev Mode MCP Server: Figma announced the beta release of the Dev Mode MCP server, which brings Figma directly into the developer workflow to help LLMs achieve design-informed code generation.
Anthropic tripled its revenue in 5 months: Anthropic has hit $3 billion in annualized revenue, marking a 200% increase in just five months, according to a report from Reuters
Nucleus Embryo, the first-ever genetic optimization software that helps parents give their children the best possible start in life – long before they’re even born.
Windsurf CEO Varun Mohan posted that Anthropic is restricting the platform’s access to its Claude models, on the heels of its acquisition by OpenAI.
Vercel, the popular web hosting / development platform, doubled revenues in the last year and crossed $200M in ARR as it looks to be the hosting platform du jour for AI companies. It pays to sell shovels in a gold rush!
North America takes the bulk of AI VC investments, despite tough political environment (TechCrunch)
Anthropic researchers warn that humans could end up being "Meat Robots" controlled by AI (Futurism/ Dwarkesh Podcast)
Quantum computers are on the edge of revealing new particle physics (NewScientist)
His kids go to a Texan school where they learn only with AI tutors (X)