Physical AI — the next inflection point?
This week: OpenAI inches towards agents, and more large funding rounds with PI raising $400M and Writer raising $200M
Is physical AI in the early cusp of an inflection point?
In the last month alone, we’ve seen two new research breakthroughs, and a few large funding rounds. It feels like physical AI—intelligence that bridges the digital and physical worlds—is showing promising progress and stepping into the spotlight.
The big news this week: Physical Intelligence (PI) is developing a general model for robotics. Their new “PI” paper introduces a vision-language-action model: imagine a generalized AI that learns like a child—training language on a corpus of text, and then mastering actions like folding or stacking to build complex sequences. The vision here is to build towards a generalized robot that is able to perform a wide range of tasks, versus the specialized robots of the last generation.
That said, there’s still work to be done. If you told these models to “break a leg,” they might just ask which one to break. Teaching machines nuance—especially human slang—is one of the many hurdles in physical AI.
There’s massive amounts of vision and action training data needed in this approach — which Fei Fei Li’s well funded World Labs — might also be going towards. Startups like Odyssey, built by former self-driving car teams from Cruise, Waymo, are building their own proprietary training sets with their backpack-sized data capture system, like a google street view for humans. For now, Odyssey is focused on turning that data set into generative models for film and games.
Meanwhile, Archetype is taking another approach: a foundational model that uses zero-shot or few-shot learning for physical AI. In their paper, which lays out a phenomenological approach to AI models (i.e a nod to an experiential way of learning), proposes an encoder-decoder framework for a model that’s able to decode any sensor signal without needing prior training on large datasets.
Listen to NotebookLM’s podcast on Archetype’s paper
It’s early days, but the promise of a sensor-agnostic, zero-shot model is huge. The challenge? Ensuring sensor variability doesn’t distort the true physical patterns, and that the framework holds up against an extensive range of use-cases.
There are some signals that we’re on the cusp of something transformative. Over the next 18-24 months, we could see much greater leaps in physical AI that could reshape industries—from robotics to film — to how we experience the world.
This feels like Physical AI warrants a market map! On it.
Disclosure: Strange Ventures is an early backer and holds a position in Archetype AI.
The Latest This Week
OpenAI reportedly plans to launch an AI agent early next year: OpenAI is preparing to release an autonomous AI agent that can control computers and perform tasks independently, code-named “Operator.” The company plans to debut it as a research preview and developer tool in January, according to Bloomberg. This move intensifies the competition among tech giants developing AI agents: Anthropic recently introduced its “computer use” capability, while Google is reportedly preparing its own version for a December release. Perhaps in anticipation of that they released “Working with apps” today, a feature in the ChatGPT app that allows the model to “read” what you’re doing in coding and writing tools like VScode, XCode, TextEdit.
DeepL launches DeepL Voice, real-time, text-based translations from voices and videos: DeepL has made a name for itself with online text translation it claims is more nuanced and precise than services from the likes of Google – a pitch that has catapulted the German startup to a valuation of $2 billion and more than 100,000 paying customers. Now, as the hype for AI services continues to grow, DeepL is adding another mode to the platform: audio. Users will now be able to use DeepL Voice to listen to someone speaking in one language and automatically translate it to another, in real time.
Generative AI startup Writer raises $200M at a $1.9B valuation: Writer has raised $200 million at a $1.9 billion valuation to expand its enterprise-focused generative AI platform. In October, Writer released a model, Palmyra X 004, trained almost entirely on synthetic data. Writer’s current focus is on “AI agents” that can plan and execute workflows across systems and teams, as well as customizable AI guardrails and a suite of no-code development tools.
Alibaba Cloud's Qwen has unveiled the Qwen2.5-Coder series, open-source AI coding models ranging from 0.5B to 32B parameters. The 32B excels in code generation, debugging, and supports 40+ languages, rivaling GPT-4 and Claude 3.5 Sonnet.
ByteDance announces X-Portrait 2. A new SOTA model for Talking Head Generation that can transfer fast head movements, minuscule expression changes and strong personal emotions.