Related papers: LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

URL: http://arxiv.org/abs/2406.16038v1
Date: Sun, 23 Jun 2024 07:26:13 GMT
Title: LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
Authors: Delin Qu, Qizhi Chen, Pingrui Zhang, Xianqiang Gao, Bin Zhao, Dong Wang, Xuelong Li,
Abstract summary: We extend the interactive object reconstruction from single object level to complex scene level. We propose LiveScene, the first scene-level language-embedded interactive neural radiance field. LiveScene efficiently reconstructs and controls multiple interactive objects in complex scenes.
Score: 45.1230495980299
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper aims to advance the progress of physical world interactive scene reconstruction by extending the interactive object reconstruction from single object level to complex scene level. To this end, we first construct one simulated and one real scene-level physical interaction dataset containing 28 scenes with multiple interactive objects per scene. Furthermore, to accurately model the interactive motions of multiple objects in complex scenes, we propose LiveScene, the first scene-level language-embedded interactive neural radiance field that efficiently reconstructs and controls multiple interactive objects in complex scenes. LiveScene introduces an efficient factorization that decomposes the interactive scene into multiple local deformable fields to separately reconstruct individual interactive objects, achieving the first accurate and independent control on multiple interactive objects in a complex scene. Moreover, we introduce an interaction-aware language embedding method that generates varying language embeddings to localize individual interactive objects under different interactive states, enabling arbitrary control of interactive objects using natural language. Finally, we evaluate LiveScene on the constructed datasets OminiSim and InterReal with various simulated and real-world complex scenes. Extensive experiment results demonstrate that the proposed approach achieves SOTA novel view synthesis and language grounding performance, surpassing existing methods by +9.89, +1.30, and +1.99 in PSNR on CoNeRF Synthetic, OminiSim #chanllenging, and InterReal #chanllenging datasets, and +65.12 of mIOU on OminiSim, respectively. Project page: \href{https://livescenes.github.io}{https://livescenes.github.io}.

Related papers

HOSIG: Full-Body Human-Object-Scene Interaction Generation with Hierarchical Scene Perception [57.37135310143126]
HO SIG is a novel framework for synthesizing full-body interactions through hierarchical scene perception.<n>Our framework supports unlimited motion length through autoregressive generation and requires minimal manual intervention.<n>This work bridges the critical gap between scene-aware navigation and dexterous object manipulation.
arXiv Detail & Related papers (2025-06-02T12:08:08Z)
Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics [30.43930233035367]
This paper introduces a unified Human-Object Interaction framework. It provides unified control over interactions with static scenes and dynamic objects using language commands. Our framework supports long-horizon interactions among dynamic, articulated, and static objects.
arXiv Detail & Related papers (2025-03-24T05:18:04Z)
ChatSplat: 3D Conversational Gaussian Splatting [51.40403199909113]
ChatSplat is a system that constructs a 3D language field, enabling rich chat-based interaction within 3D space. For view-level interaction, we designed an encoder that encodes the rendered feature map of each view into tokens, which are then processed by a large language model. At the scene level, ChatSplat combines multi-view tokens, enabling interactions that consider the entire scene.
arXiv Detail & Related papers (2024-12-01T08:59:30Z)
The Scene Language: Representing Scenes with Programs, Words, and Embeddings [23.707974056165042]
We introduce the Scene Language, a visual scene representation that concisely and precisely describes the structure, semantics, and identity of visual scenes. It represents a scene with three key components: a program that specifies the hierarchical and relational structure of entities in the scene, words in natural language that summarize the semantic class of each entity, and embeddings that capture the visual identity of each entity.
arXiv Detail & Related papers (2024-10-22T07:40:20Z)
Generating Human Motion in 3D Scenes from Text Descriptions [60.04976442328767]
This paper focuses on the task of generating human motions in 3D indoor scenes given text descriptions of the human-scene interactions. We propose a new approach that decomposes the complex problem into two more manageable sub-problems. For language grounding of the target object, we leverage the power of large language models; for motion generation, we design an object-centric scene representation.
arXiv Detail & Related papers (2024-05-13T14:30:12Z)
Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model. To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z)
ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation [17.34617771579733]
We present ASSIST, an object-wise neural radiance field as a panoptic representation for compositional and realistic simulation. A novel scene node data structure that stores the information of each object in a unified fashion allows online interaction in both intra- and cross-scene settings.
arXiv Detail & Related papers (2023-11-10T17:56:43Z)
Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings. We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters. Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z)
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis. It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination. We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z)
Compositional Human-Scene Interaction Synthesis with Semantic Control [16.93177243590465]
We aim to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications. We design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded. Inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs.
arXiv Detail & Related papers (2022-07-26T11:37:44Z)
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.