What are ‘world models’ for AI, and why are they important?
World models, also known as world simulators, are being hailed by some as the next big thing in AI.
AI pioneer Fei-Fei Li’s World Labs raised $230 million to build “large-scale models of the world,” and DeepMind hired one of the creators of OpenAI’s video generator, Sora, to work on “world simulations.” (Sora was released on Monday; here are some first impressions.)
But what the heck there is these things?
World models draw inspiration from the mental models of the world that humans naturally develop. Our brains take abstract representations from our senses and make concrete understandings of the world around us, producing what we call “models” long before AI embraced the term. The predictions our brains make based on these models influence the way we see the world.
A paper by AI researchers David Ha and Jürgen Schmidhuber gives the example of a baseball bat. Hitters have milliseconds to decide how to swing their bat – shorter than the time it takes for visual signals to reach the brain. The reason they can hit a 100-mile-per-hour fastball is because they can automatically predict where the ball is going to go, Ha and Schmidhuber said.
“In trained athletes, all of this happens unconsciously,” the research duo wrote. “Their muscles move the bat at the right time and place according to the predictions of their internal models. They can act quickly on their predictions of the future without the need to carefully use future scenarios to make a plan. “
It is these abstract reasoning aspects of world models that some believe are prerequisites for human-level intelligence.
Modeling the world
Although the concept has been around for decades, global models have gained popularity recently in part because of their promising applications in the field of video production.
Most, if not all, of the AI-generated videos veer off into a mysterious place in the valley. Watch them long enough and stuff strange it will happen, as the limbs twist and join each other.
While a generative model trained on video ages may accurately predict that a basketball bounces, it actually has no idea why — just like language models don’t understand the concepts of words and sentences. But a model of the world that knows why the basketball bounces the way it does will be better at showing that it does that thing.
To enable this type of understanding, world models are trained on a range of data, including images, audio, videos, and text, with the aim of creating internal representations of how the world works, and the ability to reason about the consequences of actions. .
“The viewer expects the world they are watching to behave in the same way as in reality,” said Alex Mashrabov, the former chief AI of AI and CEO of Higgsfield, which develops models that generate videos. “When a feather comes down with the weight of an anvil or a bowling ball shoots up hundreds of feet in the air, it stings and knocks the viewer out at that moment. With a robust world model, instead of the creator explaining how each thing is expected to go – which is boring, tedious, and a waste of time – the model will understand this. “
But better video production is the tipping point for global models. Researchers including Meta AI chief scientist Yann LeCun say the models could one day be used for complex predictions and planning in the digital and physical environment.
In a speech earlier this year, LeCun explained how the global model can help achieve the desired goal through consultation. A model with a basic representation of the “world” (eg a video of a dirty room), given a goal (a clean room), may come up with a series of actions to achieve that goal (use vacuums to sweep, clean). dishes, empty the trash) not because that is the pattern he has seen but because he knows at a deep level how to go from dirty to clean.
“We need machines that understand the world; [machines] that can remember things, have intuition, have common sense – things that can think and plan on the same level as humans,” said LeCun. “In addition to what you may have heard from some highly motivated people, current AI systems cannot do this.”
Although LeCun estimates that we are at least a decade away from the world models he envisions, today’s world models show promise as simulations of basic physics.
OpenAI notes in a blog post that Sora, which it considers a universal model, can simulate actions like a painter leaving brush strokes on a canvas. Models like Sora – and Sora itself – can also successfully simulate video games. For example, Sora can provide a Minecraft-like UI and game world.
Future world models may be able to generate 3D worlds on demand for games, virtual imaging, and more, World Labs founder Justin Johnson said in an episode of the a16z podcast.
“We already have the ability to create virtual, interactive worlds, but it costs hundreds and hundreds of millions of dollars and a ton of development time,” Johnson said. “[World models] will not only allow you to find a photo or clip, but a fully simulated, dynamic, and interactive 3D world.”
High barriers
Although the concept is attractive, many technical challenges stand in the way.
The training and running of global models requires significant computing power even compared to the amount currently used by production models. While some of the latest language models can run on a modern smartphone, Sora (which is arguably an old-world model) will require thousands of GPUs to train and run, especially if its use becomes commonplace.
Global models, like all AI models, are also deceptive – and include biases in their training data. A global model trained heavily on videos of sunny weather in European cities may struggle to understand or depict Korean cities in snowy conditions, for example, or simply do it incorrectly.
A general lack of training data threatens to exacerbate these problems, said Mashrabov.
“We’ve seen models that are really limited to generations of people of a certain type or race,” he said. “The training data for the global model must be broad enough to cover a diverse set of scenarios, but also specific enough that the AI can deeply understand the nuances of those scenarios.”
In a recent post, the Chief AI Manager of Runway, Cristóbal Valenzuela, says that data and engineering issues prevent today’s models from accurately capturing the behavior of the world’s inhabitants (eg humans and animals). “Models will need to create consistent maps of the environment,” he said, “with the ability to navigate and communicate in those environments.”
If all major hurdles are overcome, however, Mashrabov believes that world models “can” integrate AI with the real world – leading to success not only in virtual world generation but robotics and AI decision-making.
They may also produce skilled robots.
Robots today are limited in what they can do because they have no awareness of the world around them (or their own bodies). Global models can give them that awareness, Mashrabov says — at least to a point.
“With an advanced world model, AI can develop a personal understanding of any situation it’s put in,” he said, “and then start brainstorming potential solutions.”
TechCrunch has an AI-focused newsletter! Register here to receive it in your inbox every Wednesday.
Source link