When Machines Start Reasoning in Language

Deepmind's AlphaEarth can map the entire planet using satellite data. Zuck bets on personal superintelligence to create and connect over "productivity software"

Aug 01, 2025

I remember sitting in math class as a kid - I was maybe 9 or 10 - when I realized something: the hardest part of solving a problem often wasn’t doing the math. It was understanding what the question was really asking.

That moment (of realizing that reasoning starts with interpretation) came back to me while watching today’s AI systems hit a similar threshold. We’re moving from machines that solve problems to machines that interpret them.

For decades, artificial intelligence leaned in two directions. On one side: symbolic systems. Rule-based software that could verify deep truths but couldn’t handle ambiguity. On the other were statistical models, neural networks trained to recognize patterns at massive scale but unable to explain their thinking.

Language was always treated as output. A user interface, a way to communicate, not a way to reason.

That is changing.

This shift became especially clear in a recent interview with the three-person OpenAI research team (Alex Wei, Sheryl Hsu, and Noam Brown) whose model achieved gold-level performance on the International Mathematical Olympiad.

“When you see the model thinking about it,” said Brown, “it will express its uncertainty or its confidence in natural language throughout the process. If it’s really confident, it’ll say 'good' a lot. And if it’s unsure, it’ll throw in a lot of question marks.”
“And it’s cool that I can follow along and see how the model is feeling about its progress, even if I can’t really tell if it’s got it correct or not.”

The model wasn’t just generating an answer. It was narrating its thought process.

Why This Matters

That moment captures a larger shift in how models are evolving.

Today’s most advanced systems, like OpenAI’s o3 and Anthropic’s Opus, or experimental ones like the one used in the IMO, explore multiple reasoning paths in parallel. They evaluate their confidence step by step. They revise their logic mid-way. And they often reread their own explanations to improve them.

Think. Write. Read. Reflect. Revise. This loop is a foundation of human reasoning. Now, machines are doing it too.

This new reasoning loop stems from deeper architectural changes:

Training models to improve reasoning process, not just final outcomes
Rewarding plausibility even when the final answer can’t be verified
Using verifier agents to critique and refine outputs
Representing logic in natural language rather than symbolic formalism
Running multiple paths at once and choosing the best trajectory

This makes the systems fundamentally more collaborative. We can read their reasoning, intervene in it, and shape it. Language becomes the shared medium of thought between humans and machines.

Natural language can express logic, uncertainty, analogy, and abstraction. A model that can reason in language can operate across law, math, science, code, all without specialized tools. Symbolic tools may still verify or refine outputs, but they are no longer required to be the main engine.

Where This Fits in the History of AI

1. Symbolic AI (1950s–1980s): Hard-coded rules, rigid logic systems. Language was output only.

2. Connectionist AI (1980s–2010s): Neural networks that learned patterns, but couldn’t reflect or explain.

3. Foundation Models (2018–2023): Language as interface. Fluent, but shallow reasoning.

4. Language Reasoning Systems (2024– ): Language becomes the medium of thought. Models plan, reflect, and revise, much like humans do.

Language-native reasoning represents a shift in how intelligence itself is structured.

These systems don’t just complete prompts. They think aloud. They critique their own logic. They iterate.

We may look back on this moment as the turning point when machines stopped mimicking us, and started reasoning with us.

It’s getting more human in how it thinks.

-TTYL

The Download—

News that mattered this week

OpenAI hits $12 billion in annualized revenue, The Information reports: OpenAI roughly doubled its revenue in the first seven months of the year, reaching $12 billion in annualized revenue, the Information reported. The figure implies that OpenAI is generating $1 billion a month, the report said, adding that the company has around 700 million weekly active users for its ChatGPT products used by both consumers and business customers.

Zuckerberg claims ‘superintelligence is now in sight’ as Meta lavishes billions on AI: In a memo published online, Mark Zuckerberg, describes his ambitions for developing what he calls “superintelligence”. Zuckerberg wrote that the company differs from other AI firms in that Meta aims to bring “personal superintelligence to everyone”. Other companies are focused on primarily using “superintelligence” for productivity and to automate “all valuable work”, he wrote.

Figma soars 250% on IPO: Figma shares jumped 250% in their public debut after the design software maker and some of its shareholders raised $1.2 billion in an IPO, with the trading valuing the company far above the $20 billion mark it would have reached in a now-scrapped merger with Adobe Inc.

Google DeepMind says its new AI can map the entire planet with unprecedented accuracy: The system, called AlphaEarth Foundations, addresses a critical challenge: making sense of the overwhelming flood of satellite data streaming down from space. “AlphaEarth Foundations functions like a virtual satellite,” the research team writes in their paper. “It accurately and efficiently characterizes the planet’s entire terrestrial land and coastal waters by integrating huge amounts of Earth observation data into a unified digital representation.” The AI system reduces error rates by approximately 23.9% compared to existing approaches while requiring 16 times less storage space than other AI systems.

ChatGPT launches study mode to encourage ‘responsible’ academic use: ChatGPT is launching a “study mode” to encourage responsible academic use of the chatbot, amid rising cases of misuse of artificial intelligence tools at universities. The feature, which can be accessed via the chatbot’s tools button, can walk users through complex subjects in a step-by-step format akin to an unfolding academic lesson.

Positron believes it has found the secret to take on Nvidia in AI inference chips: Private chip startup Positron is positioning itself as a direct challenger to market leader Nvidia by offering dedicated, energy-efficient, memory-optimized inference chips aimed at relieving the industry’s mounting cost, power, and availability bottlenecks. Positron’s solution is Atlas, its first-generation inference accelerator built specifically to handle large transformer models. Unlike general-purpose GPUs, Atlas is optimized for the unique memory and throughput needs of modern inference tasks. The company claims Atlas delivers 3.5x better performance per dollar and up to 66% lower power usage than Nvidia’s H100, while also achieving 93% memory bandwidth utilization – far above the typical 10–30% range seen in GPUs.

Double Click—

Links to reads we found interesting

Subliminal learning’: Anthropic uncovers how AI fine-tuning secretly teaches bad habits (Venture Beat)
The trillion-dollar AI arms race is here (The Guardian)
Salesforce CEO Marc Benioff on why AI agents won’t lead to mass unemployment (Fortune)
Microsoft releases list of jobs most and least likely to be replaced by AI (Futurism)
How US adults are using AI, according to AP-NORC polling (Associated Press)
Ex-Alibaba CTO just made the boldest claim about AI & global power – “China is building the future of AI, not Silicon Valley.” (X)

Discussion about this post

Ready for more?