SIMA 2: A Generalist Embodied Agent for Virtual Worlds
- URL: http://arxiv.org/abs/2512.04797v1
- Date: Thu, 04 Dec 2025 13:46:11 GMT
- Title: SIMA 2: A Generalist Embodied Agent for Virtual Worlds
- Authors: SIMA team, Adrian Bolton, Alexander Lerchner, Alexandra Cordell, Alexandre Moufarek, Andrew Bolt, Andrew Lampinen, Anna Mitenkova, Arne Olav Hallingstad, Bojan Vujatovic, Bonnie Li, Cong Lu, Daan Wierstra, Daniel P. Sawyer, Daniel Slater, David Reichert, Davide Vercelli, Demis Hassabis, Drew A. Hudson, Duncan Williams, Ed Hirst, Fabio Pardo, Felix Hill, Frederic Besse, Hannah Openshaw, Harris Chan, Hubert Soyer, Jane X. Wang, Jeff Clune, John Agapiou, John Reid, Joseph Marino, Junkyung Kim, Karol Gregor, Kaustubh Sridhar, Kay McKinney, Laura Kampis, Lei M. Zhang, Loic Matthey, Luyu Wang, Maria Abi Raad, Maria Loks-Thompson, Martin Engelcke, Matija Kecman, Matthew Jackson, Maxime Gazeau, Ollie Purkiss, Oscar Knagg, Peter Stys, Piermaria Mendolicchio, Raia Hadsell, Rosemary Ke, Ryan Faulkner, Sarah Chakera, Satinder Singh Baveja, Shane Legg, Sheleem Kashem, Tayfun Terzi, Thomas Keck, Tim Harley, Tim Scholtes, Tyson Roberts, Volodymyr Mnih, Yulan Liu, Zhengdong Wang, Zoubin Ghahramani,
- Abstract summary: We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds.<n>Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction.
- Score: 87.15489342016714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds. Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction within an embodied environment. Unlike prior work (e.g., SIMA 1) limited to simple language commands, SIMA 2 acts as an interactive partner, capable of reasoning about high-level goals, conversing with the user, and handling complex instructions given through language and images. Across a diverse portfolio of games, SIMA 2 substantially closes the gap with human performance and demonstrates robust generalization to previously unseen environments, all while retaining the base model's core reasoning capabilities. Furthermore, we demonstrate a capacity for open-ended self-improvement: by leveraging Gemini to generate tasks and provide rewards, SIMA 2 can autonomously learn new skills from scratch in a new environment. This work validates a path toward creating versatile and continuously learning agents for both virtual and, eventually, physical worlds.
Related papers
- TongSIM: A General Platform for Simulating Intelligent Machines [59.27575233453533]
Embodied intelligence focuses on training agents within realistic simulated environments.<n>TongSIM is a high-fidelity, general-purpose platform for training and evaluating embodied agents.
arXiv Detail & Related papers (2025-12-23T10:00:43Z) - SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds [31.504258822495768]
We introduce SimWorld, a new simulator built on Unreal Engine 5 designed for developing and evaluating AI agents.<n>SimWorld offers realistic, open-ended world simulation, including accurate physical and social dynamics and language-driven procedural environment generation.<n>We demonstrate SimWorld by deploying LLM agents on long-horizon multi-agent delivery tasks involving strategic cooperation and competition.
arXiv Detail & Related papers (2025-11-30T20:58:13Z) - Dyna-Mind: Learning to Simulate from Experience for Better AI Agents [62.21219817256246]
We argue that current AI agents need ''vicarious trial and error'' - the capacity to mentally simulate alternative futures before acting.<n>We introduce Dyna-Mind, a two-stage training framework that explicitly teaches (V)LM agents to integrate such simulation into their reasoning.
arXiv Detail & Related papers (2025-10-10T17:30:18Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Tachikuma: Understading Complex Interactions with Multi-Character and
Novel Objects by Large Language Models [67.20964015591262]
We introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation task and a supporting dataset.
The dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations.
We present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding.
arXiv Detail & Related papers (2023-07-24T07:40:59Z) - BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
Interactive, and Ecological Environments [70.18430114842094]
We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation.
These activities are designed to be realistic, diverse, and complex.
We include 500 human demonstrations in virtual reality (VR) to serve as the human ground truth.
arXiv Detail & Related papers (2021-08-06T23:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.