DeepMind’s Genie 2 can generate interactive worlds similar to video games
DeepMind, Google’s AI research organization, has unveiled a model that can generate an “infinite” variety of playable 3D worlds.
Called Genie 2, the model — the successor to DeepMind’s Genie, released earlier this year — can generate a real-time scene from a single image and a text description (eg “A cute robot with a personality in the forest”). In this way, it is similar to the models under development by Fei-Fei Li’s company, World Labs, and the Israeli venture Decart.
DeepMind says Genie 2 can generate “a wide variety of rich 3D worlds,” including worlds where users can perform actions such as jumping and swimming using a mouse or keyboard. Trained on videos, the model is able to simulate object interaction, animation, lighting, physics, reflection, and behavior of “NPCs.”
Most of the simulations of Genie 2 look like AAA video games – and the reason may be that the training data of the model consists of playing popular games. But DeepMind, like many AI labs, won’t reveal many details about its data-gathering methods, possibly for competitive reasons.
One wonders about the implications of IP. DeepMind – being a subsidiary of Google – has unlimited access to YouTube, and Google has previously stated that its ToS gives it permission to use YouTube videos for model training. But does Genie 2 create unauthorized copies of “viewed” games? That’s for the courts to decide, I guess.
Genie 2 can generate static worlds with different perspectives, such as first-person and isometric views, in up to one minute, most lasting 10-20 seconds.
“Genie 2 intelligently responds to actions taken by pressing keys on the keyboard, identifying the character and moving them accordingly,” DeepMind explained in a blog post. “For example, our model [can] find that the arrow keys should move the robot and not the trees or the clouds.”
Many models like Genie 2 – world models, if you will – can simulate games and 3D environments, but with artifacting, inconsistencies, and tricky problems. For example, Descartes’ Minecraft simulator, Oasis, has a low resolution and quickly “forgets” the structure of the levels.
Genie 2, however, can remember parts of a simulated scene that are not visible and render them accurately when they are visible again, DeepMind said. (World Labs models can do this too.)
Now, games created with Genie 2 won’t be so fun. Clearing your progress every minute can drive anyone up the wall. So DeepMind positions the model as one of the research and creation tools – a tool for prototyping “interactive experiences” and testing AI agents.
“Thanks to Genie 2’s off-the-shelf capabilities, concept art and graphics can be transformed into fully interactive environments,” DeepMind wrote. “And by using Genie 2 to quickly create rich and diverse environments for AI agents, our researchers can generate test tasks that the agents have not seen during training.”
DeepMind says that while Genie 2 is still in its early stages, the lab believes it will be an important part of developing the AI agent of the future.
Source link