T2FPV: Constructing High-Fidelity First-Person View Datasets From
Real-World Pedestrian Trajectories
- URL: http://arxiv.org/abs/2209.11294v1
- Date: Thu, 22 Sep 2022 20:14:43 GMT
- Title: T2FPV: Constructing High-Fidelity First-Person View Datasets From
Real-World Pedestrian Trajectories
- Authors: Benjamin Stoler, Meghdeep Jana, Soonmin Hwang, Jean Oh
- Abstract summary: We present T2FPV, a method for constructing high-fidelity first-person view datasets given a real-world, top-down trajectory dataset.
We showcase our approach on the ETH/UCY pedestrian dataset to generate the egocentric visual data of all interacting pedestrians.
- Score: 9.44806128120871
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Predicting pedestrian motion is essential for developing socially-aware
robots that interact in a crowded environment. While the natural visual
perspective for a social interaction setting is an egocentric view, the
majority of existing work in trajectory prediction has been investigated purely
in the top-down trajectory space. To support first-person view trajectory
prediction research, we present T2FPV, a method for constructing high-fidelity
first-person view datasets given a real-world, top-down trajectory dataset; we
showcase our approach on the ETH/UCY pedestrian dataset to generate the
egocentric visual data of all interacting pedestrians. We report that the
bird's-eye view assumption used in the original ETH/UCY dataset, i.e., an agent
can observe everyone in the scene with perfect information, does not hold in
the first-person views; only a fraction of agents are fully visible during each
20-timestep scene used commonly in existing work. We evaluate existing
trajectory prediction approaches under varying levels of realistic perception
-- displacement errors suffer a 356% increase compared to the top-down, perfect
information setting. To promote research in first-person view trajectory
prediction, we release our T2FPV-ETH dataset and software tools.
Related papers
- HEADS-UP: Head-Mounted Egocentric Dataset for Trajectory Prediction in Blind Assistance Systems [47.37573198723305]
HEADS-UP is the first egocentric dataset collected from head-mounted cameras.
We propose a semi-local trajectory prediction approach to assess collision risks between blind individuals and pedestrians.
arXiv Detail & Related papers (2024-09-30T14:26:09Z) - VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions [10.748597086208145]
In this work, we propose a novel method that also incorporates visual input from surround-view cameras.
Our method achieves a latency of 53 ms, making it feasible for real-time processing.
Our experiments show that both the visual inputs and the textual descriptions contribute to improvements in trajectory prediction performance.
arXiv Detail & Related papers (2024-07-17T06:39:52Z) - Vectorized Representation Dreamer (VRD): Dreaming-Assisted Multi-Agent Motion-Forecasting [2.2020053359163305]
We introduce VRD, a vectorized world model-inspired approach to the multi-agent motion forecasting problem.
Our method combines a traditional open-loop training regime with a novel dreamed closed-loop training pipeline.
Our model achieves state-of-the-art performance on the single prediction miss rate metric.
arXiv Detail & Related papers (2024-06-20T15:34:17Z) - UnO: Unsupervised Occupancy Fields for Perception and Forecasting [33.205064287409094]
Supervised approaches leverage annotated object labels to learn a model of the world.
We learn to perceive and forecast a continuous 4D occupancy field with self-supervision from LiDAR data.
This unsupervised world model can be easily and effectively transferred to tasks.
arXiv Detail & Related papers (2024-06-12T23:22:23Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - Learning Fine-grained View-Invariant Representations from Unpaired
Ego-Exo Videos via Temporal Alignment [71.16699226211504]
We propose to learn fine-grained action features that are invariant to the viewpoints by aligning egocentric and exocentric videos in time.
To this end, we propose AE2, a self-supervised embedding approach with two key designs.
For evaluation, we establish a benchmark for fine-grained video understanding in the ego-exo context.
arXiv Detail & Related papers (2023-06-08T19:54:08Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and
Trajectory Prediction [24.855059537779294]
We present a novel end-to-end two-stage network: Spatio--Interactive Network (STINet)
In addition to 3D geometry of pedestrians, we model temporal information for each of the pedestrians.
Our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames.
arXiv Detail & Related papers (2020-05-08T18:43:01Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.