Meta’s AI chief says world models are key to ‘human-level AI’ – but it could be 10 years away

Do today’s AI models really remember, think, plan, and reason, just like a human brain can? Some AI labs would like you to believe they are, but according to Meta’s chief AI scientist Yann LeCun, the answer is no. However, he thinks we can get there in a decade or so, by pursuing a new approach called the “global model.”
Earlier this year, OpenAI released a new feature it calls “memory” that allows ChatGPT to “remember” your conversations. The latest generation of models, o1, displays the word “thinking” when generating an output, and OpenAI says the same models are capable of “complex thinking.”
All of that sounds like we’re very close to AGI. However, during a recent speech at the Hudson Forum, LeCun downplays the prospect of AI, like xAI founder Elon Musk and Google DeepMind founder Shane Legg, who suggest that human-level AI is just around the corner.
“We need machines that understand the world; [machines] able to remember things, with intuition, with logic, things that can think and plan at the same level as a human,” said LeCun during the speech. “Despite what you may have heard from some very enthusiastic people, current AI systems cannot do this.”
LeCun says today’s big language models, like those powering ChatGPT and Meta AI, are far from “human-level AI.” Humanity could be “years to decades” away from achieving such a thing, he later said. (That doesn’t stop his boss, Mark Zuckerberg, from asking him when AGI will happen, though.)
The reason why is straightforward: those LLMs work by predicting the next token (usually a few letters or a short word), and today’s image/video models predict the next pixel. In other words, language models are one-dimensional predictors, and image/video AI models are two-dimensional predictors. These models are very good at predicting their dimensions, but they don’t really understand the three-dimensional world.
Because of this, modern AI systems cannot perform simple tasks that most humans can do. LeCun notes how people learn to clear a table at age 10, and drive a car at age 17 — and learn both in just a few hours. But even the most advanced AI systems in the world today, built on thousands or millions of hours of data, cannot reliably operate in the virtual world.
To achieve complex tasks, LeCun suggests that we need to build three-dimensional models that can see the world around us, and we focus on a new type of AI architecture: world models.
“A world model is your mental model of how the world behaves,” he explains. “You can imagine a sequence of actions that you can take, and your model of the world will allow you to predict what the result of that sequence of actions will be in the world.”
Consider a “world model” in your head. For example, imagine you are looking at a dirty bedroom and you want to make it clean. You can imagine how picking up all the clothes and putting them away would do the trick. You don’t need to try many methods, or learn to clean the room first. Your mind looks at the three-dimensional space, then makes a plan of action to reach your goal in the first attempt. That app is the secret sauce promised by the world’s AI models.
Part of the advantage here is that global models can import much more data than LLMs. That also makes them computationally intensive, which is why cloud providers are racing to partner with AI companies.
Global models are a big idea now that several AI labs are chasing, and this word is becoming the next word to attract business funding. A group of highly regarded AI researchers, including Fei-Fei Li and Justin Johnson, recently raised $230 million for their startup, World Labs. The “Goddess of AI” and her team believe that world models will unlock more intelligent AI systems. OpenAI also describes its unreleased Sora video generator as a global model, but it’s not specific yet.
LeCun proposed the idea of using world models to create human-level AI in a 2022 paper called “purpose-driven AI,” though he notes that the concept is more than 60 years old. In short, a basic representation of the world (such as a video of a dirty room, for example) and memory are incorporated into the world model. Then, the world model predicts what the world will look like based on that information. Then give the world model goals, including the changed state of the world you would like to achieve (like a clean room) and safeguards to ensure the model doesn’t harm people in order to achieve the goal (don’t kill please, I’m busy cleaning my room). Then the world model derives a sequence of actions to achieve these goals.
Meta’s long-term AI research lab, FAIR or Fundamental AI Research, is actively working on building goal-driven AI and global models, according to LeCun. FAIR used to work on AI for Meta’s upcoming products, but LeCun says the lab has shifted in recent years to focus on long-term AI research. LeCun says FAIR doesn’t even employ LLMs these days.
Global models are an interesting idea, but LeCun says we haven’t made much progress in making these systems a reality. There are many more difficult problems from where we are today, and he says that it is certainly more complicated than we think.
“It will take years before we get everything here to work, if not a decade,” Lecun said. “Mark Zuckerberg keeps asking me how long it will take.”
Source link