Model as a Game: On Numerical and Spatial Consistency for Generative Games
- URL: http://arxiv.org/abs/2503.21172v1
- Date: Thu, 27 Mar 2025 05:46:15 GMT
- Title: Model as a Game: On Numerical and Spatial Consistency for Generative Games
- Authors: Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, Furu Wei,
- Abstract summary: We revisit the paradigm of generative games to explore what truly constitutes a Model as a Game (MaaG) with a well-developed mechanism.<n>Based on the DiT architecture, we design two specialized modules: (1) a numerical module that integrates a LogicNet to determine event triggers, with calculations processed externally as conditions for image generation; and (2) a spatial module that maintains a map of explored areas, retrieving location-specific information during generation and linking new observations to ensure continuity.
- Score: 117.36098212829766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties such as numerical and spatial consistency. Numerical consistency ensures gameplay mechanics correctly reflect score changes and other quantitative elements, while spatial consistency prevents jarring scene transitions, providing seamless player experiences. In this paper, we revisit the paradigm of generative games to explore what truly constitutes a Model as a Game (MaaG) with a well-developed mechanism. We begin with an empirical study on ``Traveler'', a 2D game created by an LLM featuring minimalist rules yet challenging generative models in maintaining consistency. Based on the DiT architecture, we design two specialized modules: (1) a numerical module that integrates a LogicNet to determine event triggers, with calculations processed externally as conditions for image generation; and (2) a spatial module that maintains a map of explored areas, retrieving location-specific information during generation and linking new observations to ensure continuity. Experiments across three games demonstrate that our integrated modules significantly enhance performance on consistency metrics compared to baselines, while incurring minimal time overhead during inference.
Related papers
- PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model [23.768571323272152]
PartRM is a novel 4D reconstruction framework that simultaneously models appearance, geometry, and part-level motion from multi-view images of a static object.<n>We introduce the PartDrag-4D dataset, providing multi-view observations of part-level dynamics across over 20,000 states.<n> Experimental results show that PartRM establishes a new state-of-the-art in part-level motion learning and can be applied in manipulation tasks in robotics.
arXiv Detail & Related papers (2025-03-25T17:59:58Z) - TacticExpert: Spatial-Temporal Graph Language Model for Basketball Tactics [0.0]
Basketball tactic modeling needs to efficiently extract complex spatial-temporal dependencies from historical data.<n>Existing state-of-the-art (SOTA) models, primarily based on graph neural networks (GNNs), encounter difficulties in capturing long-term, long-distance, and fine-grained interactions.
arXiv Detail & Related papers (2025-03-13T08:27:24Z) - GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.
We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.
GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - Unbounded: A Generative Infinite Game of Character Life Simulation [68.37260000219479]
We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models.
We leverage recent advances in generative AI to create Unbounded: a game of character life simulation that is fully encapsulated in generative models.
arXiv Detail & Related papers (2024-10-24T17:59:31Z) - Using Game Play to Investigate Multimodal and Conversational Grounding in Large Multimodal Models [14.878276985702685]
In this paper, we bring a recently developed evaluation paradigm from text models to multimodal models.<n>We define games that challenge a model's capability to represent a situation from visual information and align such representations through dialogue.<n>We find that the largest closed models perform rather well on the games that we define, while even the best open-weight models struggle with them.
arXiv Detail & Related papers (2024-06-20T06:56:19Z) - Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.<n>Specifically, we introduce a Ghost Spatial Masking (GSM) module, embedded within a Transformer encoder, for spatial feature extraction.<n>We benchmark three practical sports datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Benchmarking Spatial Relationships in Text-to-Image Generation [102.62422723894232]
We investigate the ability of text-to-image models to generate correct spatial relationships among objects.
We present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
Our experiments reveal a surprising finding that, although state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations between them.
arXiv Detail & Related papers (2022-12-20T06:03:51Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.