COPILOT: Human-Environment Collision Prediction and Localization from
Egocentric Videos
- URL: http://arxiv.org/abs/2210.01781v2
- Date: Sun, 26 Mar 2023 05:27:31 GMT
- Title: COPILOT: Human-Environment Collision Prediction and Localization from
Egocentric Videos
- Authors: Boxiao Pan, Bokui Shen, Davis Rempe, Despoina Paschalidou, Kaichun Mo,
Yanchao Yang, Leonidas J. Guibas
- Abstract summary: The ability to forecast human-environment collisions from egocentric observations is vital to enable collision avoidance in applications such as VR, AR, and wearable assistive robotics.
We introduce the challenging problem of predicting collisions in diverse environments from multi-view egocentric videos captured from body-mounted cameras.
We propose a transformer-based model called COPILOT to perform collision prediction and localization simultaneously.
- Score: 62.34712951567793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to forecast human-environment collisions from egocentric
observations is vital to enable collision avoidance in applications such as VR,
AR, and wearable assistive robotics. In this work, we introduce the challenging
problem of predicting collisions in diverse environments from multi-view
egocentric videos captured from body-mounted cameras. Solving this problem
requires a generalizable perception system that can classify which human body
joints will collide and estimate a collision region heatmap to localize
collisions in the environment. To achieve this, we propose a transformer-based
model called COPILOT to perform collision prediction and localization
simultaneously, which accumulates information across multi-view inputs through
a novel 4D space-time-viewpoint attention mechanism. To train our model and
enable future research on this task, we develop a synthetic data generation
framework that produces egocentric videos of virtual humans moving and
colliding within diverse 3D environments. This framework is then used to
establish a large-scale dataset consisting of 8.6M egocentric RGBD frames.
Extensive experiments show that COPILOT generalizes to unseen synthetic as well
as real-world scenes. We further demonstrate COPILOT outputs are useful for
downstream collision avoidance through simple closed-loop control. Please visit
our project webpage at https://sites.google.com/stanford.edu/copilot.
Related papers
- Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models [16.259040755335885]
Previous auto-regression-based 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans.
We introduce two spatial collision guidance mechanisms: human-object collision avoidance and object-room boundary constraints.
Our framework can generate more natural and plausible 3D scenes with precise human-scene interactions.
arXiv Detail & Related papers (2024-06-26T08:18:39Z) - EgoNav: Egocentric Scene-aware Human Trajectory Prediction [15.346096596482857]
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons.
Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer.
In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings.
arXiv Detail & Related papers (2024-03-27T21:43:12Z) - NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning [67.53972459080437]
This paper presents NeuPAN: a real-time, highly-accurate, robot-agnostic, and environment-invariant robot navigation solution.
Leveraging a tightly-coupled perception-locomotion framework, NeuPAN has two key innovations compared to existing approaches.
We evaluate NeuPAN on car-like robot, wheel-legged robot, and passenger autonomous vehicle, in both simulated and real-world environments.
arXiv Detail & Related papers (2024-03-11T15:44:38Z) - Robots That Can See: Leveraging Human Pose for Trajectory Prediction [30.919756497223343]
We present a Transformer based architecture to predict human future trajectories in human-centric environments.
The resulting model captures the inherent uncertainty for future human trajectory prediction.
We identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error.
arXiv Detail & Related papers (2023-09-29T13:02:56Z) - CabiNet: Scaling Neural Collision Detection for Object Rearrangement
with Procedural Scene Generation [54.68738348071891]
We first generate over 650K cluttered scenes - orders of magnitude more than prior work - in diverse everyday environments.
We render synthetic partial point clouds from this data and use it to train our CabiNet model architecture.
CabiNet is a collision model that accepts object and scene point clouds, captured from a single-view depth observation.
arXiv Detail & Related papers (2023-04-18T21:09:55Z) - GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras [99.07219478953982]
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras.
We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions.
In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
arXiv Detail & Related papers (2021-12-02T18:59:54Z) - Egocentric Human Trajectory Forecasting with a Wearable Camera and
Multi-Modal Fusion [24.149925005674145]
We address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces.
The trajectory forecasting ability learned from the data of different camera wearers can be transferred to assist visually impaired people in navigation.
A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism has been designed to predict the future trajectory of the camera wearer.
arXiv Detail & Related papers (2021-11-01T14:58:05Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Object Rearrangement Using Learned Implicit Collision Functions [61.90305371998561]
We propose a learned collision model that accepts scene and query object point clouds and predicts collisions for 6DOF object poses within the scene.
We leverage the learned collision model as part of a model predictive path integral (MPPI) policy in a tabletop rearrangement task.
The learned model outperforms both traditional pipelines and learned ablations by 9.8% in accuracy on a dataset of simulated collision queries.
arXiv Detail & Related papers (2020-11-21T05:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.