Tracking People with 3D Representations
- URL: http://arxiv.org/abs/2111.07868v1
- Date: Mon, 15 Nov 2021 16:15:21 GMT
- Title: Tracking People with 3D Representations
- Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra
Malik
- Abstract summary: We present a novel approach for tracking multiple people in video.
Unlike past approaches which employ 2D representations, we employ 3D representations of people, located in three-dimensional space.
We find that 3D representations are more effective than 2D representations for tracking in these settings.
- Score: 78.97070307547283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel approach for tracking multiple people in video. Unlike
past approaches which employ 2D representations, we focus on using 3D
representations of people, located in three-dimensional space. To this end, we
develop a method, Human Mesh and Appearance Recovery (HMAR) which in addition
to extracting the 3D geometry of the person as a SMPL mesh, also extracts
appearance as a texture map on the triangles of the mesh. This serves as a 3D
representation for appearance that is robust to viewpoint and pose changes.
Given a video clip, we first detect bounding boxes corresponding to people, and
for each one, we extract 3D appearance, pose, and location information using
HMAR. These embedding vectors are then sent to a transformer, which performs
spatio-temporal aggregation of the representations over the duration of the
sequence. The similarity of the resulting representations is used to solve for
associations that assigns each person to a tracklet. We evaluate our approach
on the Posetrack, MuPoTs and AVA datasets. We find that 3D representations are
more effective than 2D representations for tracking in these settings, and we
obtain state-of-the-art performance. Code and results are available at:
https://brjathu.github.io/T3DP.
Related papers
- Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image.
Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Gait Recognition in the Wild with Dense 3D Representations and A
Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes.
This paper aims to explore dense 3D representations for gait recognition in the wild.
We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z) - Tracking People by Predicting 3D Appearance, Location & Pose [78.97070307547283]
We first lift people to 3D from a single frame in a robust way.
As we track a person, we collect 3D observations over time in a tracklet representation.
We use these models to predict the future state of the tracklet.
arXiv Detail & Related papers (2021-12-08T18:57:15Z) - Shape-aware Multi-Person Pose Estimation from Multi-View Images [47.13919147134315]
Our proposed coarse-to-fine pipeline first aggregates noisy 2D observations from multiple camera views into 3D space.
The final pose estimates are attained from a novel optimization scheme which links high-confidence multi-view 2D observations and 3D joint candidates.
arXiv Detail & Related papers (2021-10-05T20:04:21Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping [23.456046776979903]
We propose to leverage multiview data of textitstatic points in arbitrary scenes (static or dynamic) to learn a neural 3D mapping module.
The neural 3D mapper consumes RGB-D data as input, and produces a 3D voxel grid of deep features as output.
We show that our unsupervised 3D object trackers outperform prior unsupervised 2D and 2.5D trackers, and approach the accuracy of supervised trackers.
arXiv Detail & Related papers (2020-08-04T02:59:23Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.