Physical intelligence is about to become as accessible as language intelligence. Announcements from World Labs, Tesla, and Deepmind on world models shift the gravity this week.
This is absolutely your best newsletter ever. I agree with your premise - I believe that all evidence points to intelligence being an embodied phenomena, and without a world model AI will never achieve an artificial general intelligence or a super intelligence. We are visual beings, 60% of our brain is a visual processor. Humans solve problems visually, and with an instinctive understanding of the physics of the world that comes from our embodiment (watch a child learn to crawl or walk and this becomes obvious.) These world models bring that understanding and the ability to imagine spaces and places that don't exist. LLM's simulate a different kind of thinking (the part of our brain that talks all the time) and are insufficient if we want to get to a more comprehensive intelligence. Thanks for the breakdown of the three approaches, this kind of thoughtful writing is why I subscribe.
The distinction between LLMs procesing patterns versus world models simulating futures is a critical insight. What strikes me most is how each approach handles the sim to real gap diferently. Tesla's vision only stack is elegnt but fragile in edge cases, while Archetype's raw waveform approach might discover causal patterns invisible to human perception. The real breakthrough will likely come from hybrid systems that understand both symbolic knowldge and physical causality.
This is absolutely your best newsletter ever. I agree with your premise - I believe that all evidence points to intelligence being an embodied phenomena, and without a world model AI will never achieve an artificial general intelligence or a super intelligence. We are visual beings, 60% of our brain is a visual processor. Humans solve problems visually, and with an instinctive understanding of the physics of the world that comes from our embodiment (watch a child learn to crawl or walk and this becomes obvious.) These world models bring that understanding and the ability to imagine spaces and places that don't exist. LLM's simulate a different kind of thinking (the part of our brain that talks all the time) and are insufficient if we want to get to a more comprehensive intelligence. Thanks for the breakdown of the three approaches, this kind of thoughtful writing is why I subscribe.
thank you for your kind words! I so agree with this sentiment:
"Humans solve problems visually, and with an instinctive understanding of the physics of the world that comes from our embodiment"
The distinction between LLMs procesing patterns versus world models simulating futures is a critical insight. What strikes me most is how each approach handles the sim to real gap diferently. Tesla's vision only stack is elegnt but fragile in edge cases, while Archetype's raw waveform approach might discover causal patterns invisible to human perception. The real breakthrough will likely come from hybrid systems that understand both symbolic knowldge and physical causality.