Tracking People by Predicting 3D Appearance, Location & Pose
- URL: http://arxiv.org/abs/2112.04477v1
- Date: Wed, 8 Dec 2021 18:57:15 GMT
- Title: Tracking People by Predicting 3D Appearance, Location & Pose
- Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra
Malik
- Abstract summary: We first lift people to 3D from a single frame in a robust way.
As we track a person, we collect 3D observations over time in a tracklet representation.
We use these models to predict the future state of the tracklet.
- Score: 78.97070307547283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present an approach for tracking people in monocular
videos, by predicting their future 3D representations. To achieve this, we
first lift people to 3D from a single frame in a robust way. This lifting
includes information about the 3D pose of the person, his or her location in
the 3D space, and the 3D appearance. As we track a person, we collect 3D
observations over time in a tracklet representation. Given the 3D nature of our
observations, we build temporal models for each one of the previous attributes.
We use these models to predict the future state of the tracklet, including 3D
location, 3D appearance, and 3D pose. For a future frame, we compute the
similarity between the predicted state of a tracklet and the single frame
observations in a probabilistic manner. Association is solved with simple
Hungarian matching, and the matches are used to update the respective
tracklets. We evaluate our approach on various benchmarks and report
state-of-the-art results.
Related papers
- TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for
Autonomous Driving [3.8073142980733]
We propose jointly training 3D detection and 3D tracking from only monocular videos in an end-to-end manner.
Time3D achieves 21.4% AMOTA, 13.6% AMOTP on the nuScenes 3D tracking benchmark, surpassing all published competitors.
arXiv Detail & Related papers (2022-05-30T06:41:10Z) - Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds.
We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z) - Tracking People with 3D Representations [78.97070307547283]
We present a novel approach for tracking multiple people in video.
Unlike past approaches which employ 2D representations, we employ 3D representations of people, located in three-dimensional space.
We find that 3D representations are more effective than 2D representations for tracking in these settings.
arXiv Detail & Related papers (2021-11-15T16:15:21Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping [23.456046776979903]
We propose to leverage multiview data of textitstatic points in arbitrary scenes (static or dynamic) to learn a neural 3D mapping module.
The neural 3D mapper consumes RGB-D data as input, and produces a 3D voxel grid of deep features as output.
We show that our unsupervised 3D object trackers outperform prior unsupervised 2D and 2.5D trackers, and approach the accuracy of supervised trackers.
arXiv Detail & Related papers (2020-08-04T02:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.