Related papers: Learning Continuous Environment Fields via Implicit Functions

Learning Continuous Environment Fields via Implicit Functions

URL: http://arxiv.org/abs/2111.13997v1
Date: Sat, 27 Nov 2021 22:36:58 GMT
Title: Learning Continuous Environment Fields via Implicit Functions
Authors: Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu
Abstract summary: We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory. We demonstrate that this environment field representation can directly guide the dynamic behaviors of agents in 2D mazes or 3D indoor scenes.
Score: 144.4913852552954
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory. We demonstrate that this environment field representation can directly guide the dynamic behaviors of agents in 2D mazes or 3D indoor scenes. Our environment field is a continuous representation and learned via a neural implicit function using discretely sampled training data. We showcase its application for agent navigation in 2D mazes, and human trajectory prediction in 3D indoor environments. To produce physically plausible and natural trajectories for humans, we additionally learn a generative model that predicts regions where humans commonly appear, and enforce the environment field to be defined within such regions. Extensive experiments demonstrate that the proposed method can generate both feasible and plausible trajectories efficiently and accurately.

Related papers

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving [22.832008530490167]
We propose a semi-supervised vision-centric 3D occupancy world model, PreWorld, to leverage the potential of 2D labels. PreWorld achieves competitive performance across 3D occupancy prediction, 4D occupancy forecasting and motion planning tasks.
arXiv Detail & Related papers (2025-02-11T07:12:26Z)
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction [67.81475355852997]
3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings. We propose a world-model-based framework to exploit the scene evolution for perception. Our framework improves the performance of the single-frame counterpart by over 2% in mIoU without introducing additional computations.
arXiv Detail & Related papers (2024-12-13T18:59:54Z)
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding [63.99937807085461]
3D occupancy prediction provides a comprehensive description of the surrounding scenes. Most existing methods focus on offline perception from one or a few views. We formulate an embodied 3D occupancy prediction task to target this practical scenario and propose a Gaussian-based EmbodiedOcc framework to accomplish it.
arXiv Detail & Related papers (2024-12-05T17:57:09Z)
Volumetric Environment Representation for Vision-Language Navigation [66.04379819772764]
Vision-language navigation (VLN) requires an agent to navigate through a 3D environment based on visual observations and natural language instructions. We introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells. VER predicts 3D occupancy, 3D room layout, and 3D bounding boxes jointly.
arXiv Detail & Related papers (2024-03-21T06:14:46Z)
CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting [15.392692128626809]
We propose CARFF, a method for predicting future 3D scenes given past observations. We employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations. We demonstrate the utility of our method in scenarios using the CARLA driving simulator.
arXiv Detail & Related papers (2024-01-31T18:56:09Z)
Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration. We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE. We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z)
Synthesizing Diverse Human Motions in 3D Indoor Scenes [16.948649870341782]
We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner. Existing approaches rely on training sequences that contain captured human motions and the 3D scenes they interact with. We propose a reinforcement learning-based approach that enables virtual humans to navigate in 3D scenes and interact with objects realistically and autonomously.
arXiv Detail & Related papers (2023-05-21T09:22:24Z)
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion [83.88829943619656]
We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups. We propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios.
arXiv Detail & Related papers (2023-04-04T15:46:42Z)
Neural Poisson: Indicator Functions for Neural Fields [25.41908065938424]
Implicit neural field generating signed distance field representations (SDFs) of 3D shapes have shown remarkable progress. We introduce a new paradigm for neural field representations of 3D scenes. We show that our approach demonstrates state-of-the-art reconstruction performance on both synthetic and real scanned 3D scene data.
arXiv Detail & Related papers (2022-11-25T17:28:22Z)
Pose2Room: Understanding 3D Scenes from Human Activities [35.702234343672565]
With wearable IMU sensors, one can estimate human poses from wearable devices without requiring visual input. We show that P2R-Net can effectively learn multi-modal distributions of likely objects for human motions.
arXiv Detail & Related papers (2021-12-01T20:54:36Z)
Environment Predictive Coding for Embodied Agents [92.31905063609082]
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
arXiv Detail & Related papers (2021-02-03T23:43:16Z)
Long-term Human Motion Prediction with Scene Context [60.096118270451974]
We propose a novel three-stage framework for predicting human motion. Our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path.
arXiv Detail & Related papers (2020-07-07T17:59:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.