EgoNav: Egocentric Scene-aware Human Trajectory Prediction
- URL: http://arxiv.org/abs/2403.19026v3
- Date: Wed, 7 Aug 2024 05:54:04 GMT
- Title: EgoNav: Egocentric Scene-aware Human Trajectory Prediction
- Authors: Weizhuo Wang, C. Karen Liu, Monroe Kennedy III,
- Abstract summary: Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons.
Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer.
In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings.
- Score: 15.346096596482857
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings. To facilitate research in ego-motion prediction, we have collected a comprehensive walking scene navigation dataset centered on the user's perspective. We then present a method to predict human motion conditioning on the surrounding static scene. Our method leverages a diffusion model to produce a distribution of potential future trajectories, taking into account the user's observation of the environment. To that end, we introduce a compact representation to encode the user's visual memory of the surroundings, as well as an efficient sample-generating technique to speed up real-time inference of a diffusion model. We ablate our model and compare it to baselines, and results show that our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
Related papers
- MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos [27.766405152248055]
Hand trajectory prediction plays a vital role in comprehending human motion patterns.
However, capturing high-level human intentions consistent with reasonable temporal causality is challenging when only egocentric videos are available.
We propose a novel hand trajectory prediction method dubbed MADiff, which forecasts future hand waypoints with diffusion models.
arXiv Detail & Related papers (2024-09-04T12:06:33Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - Robots That Can See: Leveraging Human Pose for Trajectory Prediction [30.919756497223343]
We present a Transformer based architecture to predict human future trajectories in human-centric environments.
The resulting model captures the inherent uncertainty for future human trajectory prediction.
We identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error.
arXiv Detail & Related papers (2023-09-29T13:02:56Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - COPILOT: Human-Environment Collision Prediction and Localization from
Egocentric Videos [62.34712951567793]
The ability to forecast human-environment collisions from egocentric observations is vital to enable collision avoidance in applications such as VR, AR, and wearable assistive robotics.
We introduce the challenging problem of predicting collisions in diverse environments from multi-view egocentric videos captured from body-mounted cameras.
We propose a transformer-based model called COPILOT to perform collision prediction and localization simultaneously.
arXiv Detail & Related papers (2022-10-04T17:49:23Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Egocentric Human Trajectory Forecasting with a Wearable Camera and
Multi-Modal Fusion [24.149925005674145]
We address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces.
The trajectory forecasting ability learned from the data of different camera wearers can be transferred to assist visually impaired people in navigation.
A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism has been designed to predict the future trajectory of the camera wearer.
arXiv Detail & Related papers (2021-11-01T14:58:05Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.