The Future in 3D
It feels inevitable that within the next 10 years, the majority of media we consume will be generated. Made not by cameras or traditional studios, but by algorithms capable of generating stunningly lifelike 3D environments and characters in real-time.
The numbers tell a compelling story: YouTube users are consuming over a billion hours of content daily, and TikTok, with its 138 million active U.S. users, has overtaken Netflix as the platform of choice for younger audiences under 35. In particular, the appetite for user-generated content is insatiable—and only set to grow.
While gen AI optimizes production for studios, it democratizes studio-quality content for your average content creator.
This week alone, we are seeing more moves in the great race towards real-time 3D asset generation, that brings us closer to a world where most media will be synthetically generated within the next decade.
Researchers from Meta and the University of Oxford developed VFusion3D, capable of generating high-quality 3D objects from single images or text descriptions. Nvidia has been rumored to have scraped millions of videos to improve its 3D models. Fei Fei Li, the “godmother of AI” and founder of World Labs, a startup that seeks to create AI models of real-world objects and environments, reaches unicorn status within 4 months. Runway ML, the popular video generator, is said to be in talks to raise at a $4B valuation.
For now, the race to achieve richer 3D training data (whether through computer vision or synthetic data) makes it feel like we're still in the “training wheels” phase. But it’s clear that the tools for media creation are advancing at a pace we could only dream of a few years ago.
The future of media isn’t just about watching—it’s about creating, and the possibilities are expanding faster than we ever imagined.
Enjoy the update.
Tara 🍓
Latest Releases
Gemini Live speaks like a human, taking on ChatGPT Advanced Voice Mode: Google announced Gemini Live, a new voice mode for its AI model Gemini, which allows users to speak to the model in plain, conversational language. Gemini Live is designed to respond and adapt in real-time.
Meta’s VFusion3D: Researchers from Meta and the University of Oxford have developed a powerful AI model capable of generating high-quality 3D objects from single images or text descriptions. Their novel approach leverages pre-trained video AI models to generate synthetic 3D data, allowing them to train a more powerful 3D generation system.
Grok-2 arrives with image generations: The new large language model (LLM) called Grok-2 from Musk’s sister company xAI has landed. Integrated within X itself, Grok-2 comes in two model sizes: Grok-2 and Grok-2 mini. Grok-2 offers state-of-the-art performance in a wide range of tasks including chat, coding, reasoning, and vision-based application, while Grok-2 mini is a smaller, faster version optimized for efficiency, suitable for simpler text-based prompts requiring quicker responses. Grok-2 not only boasts image generation capabilities based on a partnership with Black Forest Labs and its new and surprisingly photorealistic open-source diffusion AI model Flux.1.
Meta, Universal Music Group address AI music in new licensing agreement: Meta and Universal Music Group (UMG) announced the expansion of their multi-year music licensing agreement, which enables users to share songs from UMG’s music library across Meta’s platforms (Facebook, Instagram, Horizon, Threads and WhatsApp) without violating copyright. What’s most notable about the new agreement is that it states that the two companies are addressing “unauthorized AI-generated content.”
NEA led a $100M round into Fei-Fei Li’s new AI startup, now valued at over $1B: World Labs, a stealthy startup founded by renowned Stanford University AI professor Fei-Fei Li, has raised two rounds of financing two months apart, according to multiple reports. The latest financing was led by NEA and valued the company at over $1 billion.. What Li is working on is particularly difficult to do: it’s aiming to create AI models that can accurately estimate the three-dimensional physicality of real-world objects and environments, enabling detailed digital replicas without the need for extensive data collection.
Anysphere, a GitHub Copilot rival, has raised $60M Series A at $400M valuation from a16z, Thrive, sources say: Anysphere, a two-year-old startup that’s developed an AI-powered coding assistant called Cursor, has raised over $60 million in a Series A financing at a $400 million post-money valuation, two sources familiar with the deal told TechCrunch.
New AI model can listen while speaking: Researchers introduce the Listening-while-Speaking Language Model (LSLM), which listens and speaks simultaneously using a token-based TTS model and a streaming encoder.