Agents are here. Open Source model R1 outperforms.

The product team's dilemma: the pressure to keep pace with user expectations and new capabilities.

Jan 24, 2025

Agents that can perform tasks on your behalf are no longer just a promise. This week,OpenAI launched Operator, an agent that can perform tasks on the web for you, and Perplexity dropped Perplexity Assistant (only on Android for now), which can perform tasks such as writing emails and booking dinners.

It’s still early days — Operator, for instance, still needs human input to get through captcha, and it’s unclear how these agents will handle sensitive information like payment details. But we’re well on our way to a world where agents can handle daily tasks on our behalf. Peeking under the hood, I’ve been digging into VLMs (vision language models), multimodal AI that can understand image and video, and take action on them — and how they will start to shape how humans interact with technology. A good example is UI-Tars, a paper outlining automated UI interactions.

OpenAI Unveils New Agent Tool 'Operator' - The New York Times

Open source models got a firm leg up this week with R1, a reasoning model from Deep Seek, that has on-par performance with OpenAI’s o1. Being open source means it’s free, adaptable — plus R1 is incredibly efficient.

What folks are saying:

Last thoughts: product teams are in a dilemma. The pace of consumer expectations around new AI capabilities is accelerating — but shipping something subpar will cost. In the case of formerly-beloved sound system company Sonos, they lost billions of dollars (nearly 40% of their value) from a poorly managed app roll-out that got customers into an uproar.

Must-Know News

OpenAI introduces Operator, an agent that can use its own browser to perform tasks for you.
DeepSeek’s ‘reasoning’ model R1 beats OpenAI’s o1 on certain benchmarks: Chinese AI lab DeepSeek has released R1, an open source model that performs as well as OpenAI’s o1 on certain AI benchmarks.
OpenAI teams up with SoftBank and Oracle on $500B data center project: OpenAI says that it will team up with Japanese conglomerate SoftBank and with Oracle, among others, to build multiple data centers for AI in the U.S. The joint venture, called the Stargate Project, will begin with a large data center project in Texas and eventually expand to other states.
Perplexity launches Sonar, an API for AI search: Perplexity launched an API service called Sonar, allowing enterprises and developers to build the startup’s generative AI search tools into their own applications.
Tencent introduces ‘Hunyuan3D 2.0,’ AI that speeds up 3D design from days to seconds: Tencent has unveiled “Hunyuan3D 2.0,” an AI system that turns single images or text descriptions into detailed 3D models within seconds.
Anthropic chief says AI could surpass “almost all humans at almost everything” shortly after 2027: Anthropic CEO Dario Amodei predicted that AI models may surpass human capabilities "in almost everything" within two to three years, according to a Wall Street Journal interview at the World Economic Forum in Davos, Switzerland.
Runway’s new AI image generator Frames is here, and it looks fittingly cinematic:Runway has announced the release of Frames, its newest text-to-image generation model, and it’s winning early praise from users for producing highly cinematic visuals.

Discussion about this post

Ready for more?