Are AI companies ignoring the golden egg?

Jun 21, 2024

💭 The Insight

LLMs are a commodity. Feedback is the feature, where value accrues.

I enjoyed Benedict Evan’s deep dive into Apple intelligence and his analysis of how Apple is approaching AI, with a strategy that’s different from the other tech players. He points out that Apple’s approach is not to give the user LLM-powered features at disconnected points across the experience. For instance, it's not allowing the user to just use GPT to ask it anything (yet) through an blank and open input. The AI features are all tightly locked around a specific context, an intention.

And that’s by design.

This echoes several conversations I've had this week around feedback (hat-tip to John Zimmerman and JP) and how feedback is the feature that most AI companies should be optimizing around, but aren’t.

Is feedback and the personalization layer where the value is in AI? If LLMs are a commodity, (e.g. in this leaked Google memo about how most foundational models are trained on the same data anyway), is the lasting value and power of AI lie in personalization, gleaned from how it learns from the user?

For now, in most AI products, including OpenAI, feedback feels like an afterthought. You give a thumbs up or thumbs down to the generated output, and that's really about it. But there is so much more data around user intention which can be understood, and used to make the models and experience better.

Evans points out that while OpenAI is seemingly given “free” distribution to a few hundred million Apple users, it is also being treated as an interchangeable plugin. It gives hints that personalization is the AI experience we should be designing for. Maybe that's where lasting value accrues.

🔍 The Review

We are launching a new section next week that reviews AI products, experiences, and developer tools. Interested in being reviewed? Submit an application: tinyurl.com/thestrangereview

📰 The Latest

Anthropic launched its newest model, called Claude 3.5 Sonnet, which it says can equal or better OpenAI’s GPT-4o or Google’s Gemini across a wide variety of tasks. The new model is already available to Claude users on the web and on iOS, and Anthropic is making it available to developers as well.

OpenAI co-founder Ilya Sutskever announces his new AI startup, Safe Superintelligence: OpenAI co-founder Ilya Sutskever, who left the AI startup last month, introduced his new AI company, which he’s calling Safe Superintelligence, or SSI. “I am starting a new company,” Sutskever wrote on X. “We will pursue safe superintelligence in a straight shot, with one focus, one goal, and one product.” Sutskever was OpenAI’s chief scientist and co-led the company’s Superalignment team with Jan Leike, who also left in May to join rival AI firm Anthropic. OpenAI’s Superalignment team was focused on steering and controlling AI systems but was dissolved shortly after Sutskever and Leike announced their departures. Sutskever will continue to focus on safety at his new startup.

Meta unveils five AI models for multi-modal processing, music generation, and more: Meta has unveiled five major new AI models and research, including multi-modal systems that can process both text and images, next-gen language models, music generation, AI speech detection, and efforts to improve diversity in AI systems. The releases come from Meta’s Fundamental AI Research (FAIR) team which has focused on advancing AI through open research and collaboration for over a decade. On the creative side, Meta’s JASCO allows generating music clips from text while affording more control by accepting inputs like chords and beats. Meta also released AudioSeal, which it claims is the first audio watermarking system designed to detect AI-generated speech. It can pinpoint the specific segments generated by AI within larger audio clips up to 485x faster than previous methods.

Google DeepMind’s new AI tool uses video pixels and text prompts to generate soundtracks: Google DeepMind has taken the wraps off of a new AI tool for generating video soundtracks. In addition to using a text prompt to generate audio, DeepMind’s tool also takes into account the contents of the video. By combining the two, DeepMind says users can use the tool to create scenes with “a drama score, realistic sound effects or dialogue that matches the characters and tone of a video.” Even though users can include a text prompt, DeepMind says it’s optional. Users also don’t need to meticulously match up the generated audio with the appropriate scenes. According to DeepMind, the tool can also generate an “unlimited” number of soundtracks for videos, allowing users to come up with an endless stream of audio options.

ElevenLabs unveils open-source creator tool for adding sound effects to videos: Weeks after AI voice startup ElevenLabs launched its Sound Effects text-to-sound AI offering, the company is releasing an open-source tool to showcase its potential. In “about 15 seconds,” this application enables creators to generate sound effect samples for their videos, analyzing the imported clip and providing multiple options. While developers can access the app’s code on GitHub, ElevenLabs has published a website for the public to try out its Sound Effects API.

Snap previews its real-time image model that can generate AR experiences: At the Augmented World Expo, Snap teased an early version of its real-time, on-device image diffusion model that can generate vivid AR experiences. The company also unveiled generative AI tools for AR creators. Snap co-founder and CTO Bobby Murphy said onstage that the model is small enough to run on a smartphone and fast enough to re-render frames in real time, guided by a text prompt. Snapchat users will start to see Lenses with this generative model in the coming months, and Snap plans to bring it to creators by the end of the year.
TikTok TikTok introduces Symphony Digital Avatars, letting companies ‘breathe life’ into their branded content: TikTok is introducing new generative AI tools for businesses and agencies as part of its Symphony suite of AI-powered ad solutions. The first is Digital Avatars, which brands can use to converse with followers and “breathe life” into their content marketing efforts. Businesses can choose between two types of Digital Avatars: Stock or Custom. The former are pre-built avatars created using paid actors licensed for commercial use. With the latter, businesses can craft their own avatar to represent a creator or brand spokesperson with multi-language abilities from 30 different languages. In addition, TikTok is rolling out Symphony AI Dubbing, a global translation tool that will allow creators and brands to seamlessly generate content in multiple languages.

Runway introduces Gen-3 Alpha, its new base model for video generation: Gen-3 Alpha can create highly detailed videos with complex scene changes, a wide range of cinematic choices, and detailed art directions. Gen-3 Alpha is the first of an upcoming series of models trained by Runway on a new infrastructure built for large-scale multimodal training, and represents a significant step towards our goal of building General World Models.

WPP unveils AI-powered Production Studio, enabling advertisers to unlock exponentially more content for marketing campaigns: WPP, one of the “Big Four” creative holding companies in the world, announced the launch of Production Studio, an AI-enabled, end-to-end production application developed using NVIDIA Omniverse, that streamlines and automates the creation of text, images and video, transforming content creation for advertisers and marketers. Production Studio directly addresses the challenges advertisers continue to face in producing brand-compliant and product-accurate content at scale. The innovative solution combines advanced AI technology and human creativity with proprietary 3D workflows enabled by Universal Scene Description (OpenUSD) to generate hyper-realistic and accurate content at an unprecedented volume.
Universal Music artists get access to AI voice cloning tool via UMG’s new deal with SoundLabs: Universal Music Group (UMG) announced a new partnership with SoundLabs, an AI technology company focused on ‘ethically’ trained tools for music creators. The collaboration will equip UMG’s artists and producers with AI technology through SoundLabs‘ MicDrop, an AI vocal plug-in. MicDrop, set to launch this summer, is a real-time vocal plug-in that allows artists to create high-fidelity vocal models using their own voice data. Unlike some AI tools, artists retain complete ownership and creative control over their models. MicDrop is compatible with all major digital audio workstations and comes in AU, VST3, and AAX formats.

Feedback, comments, welcome. Reach out to thereview@strangevc.com

Are AI companies ignoring the golden egg?

💭 The Insight

🔍 The Review

📰 The Latest

Discussion about this post