The Strange Review
Posts
Meta drops Llama3, the best-performing open source LLM out there.

Meta drops Llama3, the best-performing open source LLM out there.

AI Agents and moar LLMs like multimodal Reka, which can intepret video and images.

Tara Tan
April 19, 2024

🌊The Long View

AI agents, which can autonomously perform actions on behalf of the user based on instructions or prompts, will play a large part in AI workflows. Take for example, to reach the goal of writing an article: one could deploy, like in this example by @ashepreetbedi, multiple “newsroom” AI agents like Researcher (finds the relevant news sources) to Writer (drafts the article) to Editor (reviews the work).

An agent workflow takes on an iterative approach, making it more likely to succeed as it breaks a goal into more achievable subtasks, and moments for goal alignment.

AI scientist Andrew Ng outlines four design patterns that his team at the AI Fund uses for building agents.

Reflection/ Review: The LLM examines its own work to come up with ways to improve it.
Suite of tools: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).
Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.

My add: understanding human intention and evaluations sit as bookends on the agent workflow. There are AX (agent experience?) moments to design for that can improve the iterative workflows and re-align decision pathways, by getting human (or agent) feedback in a frictionless way.

🔥 Latest News

Meta releases its open source large-language model, Llama3, and is regarded as one of the highest-performing open LLMs available right now. Open source models are critical to the AI ecosystem, because it offers developers and founders a way to integrate and build apps and other fine-tuned models on top of it without fees, and for free.

Reka launches Reka Core, its multimodal language model to rival GPT-4 and Claude 3 Opus: Reka, a San Francisco-based AI startup founded by researchers from DeepMind, Google and Meta, launches a new multimodal language model called Reka Core. A multimodal LLM refers to its ability to interpret image and video with language — for instance, interpreting a visual chart, or describing an image.

Despite being trained in less than a year, it matches or beats the performance of top models from leading, deep-pocketed players in the AI space, including OpenAI, Google and Anthropic. Check out the demo video where it visually transcribes the trailer for the scifi series, Three Body Problem.

Meet Reka Core, our best and most capable multimodal language model yet. 🔮
It’s been a busy few months training this model and we are glad to finally ship it! 💪
Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body… twitter.com/i/web/status/1…
— Reka (@RekaAILabs)
Apr 15, 2024

Adobe Premiere Pro is getting generative AI video tools – and hopefully OpenAI’s Sora. Adobe is working on a generative AI video model for its Firefly family that will bring new tools to its Premiere Pro video editing platform. These new Firefly tools – alongside some proposed third-party integrations with Runway, Pika Labs, and OpenAI’s Sora models –will allow Premiere Pro users to generate video and add or remove objects using text prompts (just like Photoshop’s Generative Fill feature) and extend the length of video clips.

Microsoft drops Vasa-1, a new research paper demonstrating how to generate a live video avatar from a still image and a speech clip audio. Similar to the Emo paper from Alibaba released a few months ago, the moving avatars look incredibly realistic, although critics say the hair (either unmoving or moving awkwardly) are dead giveaways. One thing’s for sure — generative video is moving quickly and we will see many more technical leaps in the coming months.

a16z-backed Rewind pivots to build AI-powered pendant to record your conversations. The company has rebranded to “Limitless,” and is now offering an AI-powered meeting suite and a hardware pendant that can record conversations. Earlier on Wednesday, he posted that the startup has already received more than 10,000 preorders for the product.