Can AI Really Think?
A grade school math problem might have just clarified some of AI’s current limits.
Researchers from Apple gave some of the world's most advanced AI models a basic math problem: Oliver picked 44 kiwis on Friday and 58 on Saturday. On Sunday, he picked double his Friday amount. How many kiwis does Oliver have in total?
The AIs solved it perfectly. As expected.
Then they added one irrelevant detail: "but five of them were a bit smaller than average."
The result? The models' performance plummeted. They started subtracting the five smaller kiwis from the total - even though the size of the kiwis had nothing to do with counting them. It's a mistake no human child learning math would make.
This seemingly simple experiment, detailed in a new paper titled "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models" reveals benchmarked insights about the current state of AI: what looked like the beginnings of intelligent reasoning might actually be closer to sophisticated pattern matching.
The researchers created variations of grade-school math problems and found that state-of-the-art AI models, including GPT-4 and Claude, showed significant performance drops when presented with slightly modified versions of questions they could previously solve.
Even more telling, when irrelevant information was added to problems, the models often tried to use that information in their calculations, suggesting they weren't truly understanding the mathematical concepts but rather following learned patterns.
This distinction between knowing how to do something versus understanding why you're doing it becomes particularly relevant as AI agents gain more agency in the real world.
Anthropic’s new Computer Use feature allows Claude to control your computer screen and take actions based on a prompt. For instance, it can now control your computer screen, sending emails or filling out forms at your command (h/t: Mckay)
However, despite its impressive capabilities, there are still significant limitations. For now, Claude can only send static screenshots of its actions, and its ability to perform dynamic tasks remains constrained.
The security implications are also glaring - for instance, computer-using AI agents can be vulnerable to prompt injection attacks, where a simple webpage could potentially trick the AI into downloading and launching unwanted files.(h/t: Simon Willison)
Sophisticated pattern-matching skills, or not – AI experiences can feel pretty magical. Like several of the new product releases these weeks.
Take Perplexity Finance—it’s making stock market research feel, dare we say, delightful, with its data visualizations feature. And it’s about to get even better. A new partnership with Crunchbase means users will soon access exclusive private company data—hard-to-find financials, firmographics, and more.
Source: @AravSrinivas
It’s feeling less like a toy, and more like a tool in the creative world.
Midjourney’s Edit tool now lets you refine and edit images with ease — while keeping the base images consistent. And with EverArt’s Campaign feature, you can take one style and generate 500 unique images in the same aesthetic, form factor — a whole photoshoot from your laptop.
Source: Midjourney
Source: EverArt
So while AI might not be able to quite “think” on its own yet, it’s already radically changing how we create and collaborate. And it’s so much fun.
We are cohosting a Video AI developer meetup with VideoDB next Tuesday Oct 29 in San Francisco! We’ll have interactive demos and technical discussions on cutting-edge topics like multimodal search, video AI training, generative video, and the future of video storytelling.
RSVP here.
Have a demo you’d like to share? Reach out to anup@videodb.io
The Latest
OpenAI researchers develop new model that speeds up media generation by 50X: A pair of researchers at OpenAI has published a paper describing a new type of model – specifically, a new type of continuous-time consistency model (sCM) – that increases the speed at which multimedia including images, video, and audio can be generated by AI by 50 times compared to traditional diffusion models, generating images in nearly a 10th of a second compared to more than 5 seconds for regular diffusion.
AI video startup Genmo launches Mochi 1, an open source rival to Runway, Kling, and others: Genmo, an AI company focused on video generation, has announced the release of a research preview for Mochi 1, a new open-source model for generating high-quality videos from text prompts.
Runway releases new AI facial expression motion capture feature: Runway announced a new feature “Act-One,” that allows users to record video of themselves or actors from any video camera – even the one on a smartphone – and then transfers the subject’s facial expressions to that of an AI generated character with uncanny accuracy.
AI startup Ideogram launches infinite Canvas for manipulating, combining generated images: Users can spread newly generated images out, compare them to older generations, resize and reorder them at will, and even combine multiple AI generated images into one new composite.
Perplexity is reportedly looking to fundraise at an $8B valuation: AI search engine Perplexity is in fundraising talks and hopes to raise around $500 million at an $8 billion valuation, according to The Wall Street Journal. The WSJ reports that the company currently receives about 15 million queries a day and brings in around $50 million in annualized revenue.
Anthropic’s latest AI update can use a computer on its own: Anthropic’s latest Claude 3.5 Sonnet AI model has a new feature in public beta that can control a computer by looking at a screen, moving a cursor, clicking buttons, and typing text. The new feature, called “computer use," is available on the API, allowing developers to direct Claude to work on a computer like a human does. Anthropic does caution that computer use is still experimental and can be “cumbersome and error-prone”.
Former OpenAI CTO Mira Murati is reportedly fundraising for a new AI startup: Mira Murati, the OpenAI CTO who announced her departure last month, is raising VC funding for a new AI startup, according to Reuters. This startup will reportedly focus on building AI products based on proprietary models and could raise more than $100 million in this round.
Midjourney plans to let anyone on the web edit images with AI: Midjourney is planning to release an upgraded web tool that’ll let users edit any uploaded images from the web using Midjourney’s generative AI. The upgraded tool, which Midjourney CEO David Holz said will be released “early next week,” will also allow users to retexture objects in images to “repaint” their colors and details according to captions.
Canva has a shiny new text-to-image generator: Canva has added a bunch of new AI features to its web-based design platform, including updates for generating text and video effects and a more powerful text-to-image app. The latter is called “Dream Lab” – a new image generator tool born from Canva’s acquisition of generative AI startup Leonardo.ai earlier this year.