CoMotion: Concurrent Multi-person 3D Motion
- URL: http://arxiv.org/abs/2504.12186v1
- Date: Wed, 16 Apr 2025 15:40:15 GMT
- Title: CoMotion: Concurrent Multi-person 3D Motion
- Authors: Alejandro Newell, Peiyun Hu, Lahav Lipson, Stephan R. Richter, Vladlen Koltun,
- Abstract summary: We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream.<n>Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame.<n>We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy.
- Score: 88.27833466761234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time. Code and weights are provided at https://github.com/apple/ml-comotion
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D
Pose Estimation Tracking and Forecasting on a Video Snippet [24.852728097115744]
Multi-person pose understanding from RGB involves three complex tasks: pose estimation, tracking and motion forecasting.
Most existing works either focus on a single task or employ multi-stage approaches to solving multiple tasks separately.
We propose Snipper, a unified framework to perform multi-person 3D pose estimation, tracking, and motion forecasting simultaneously in a single stage.
arXiv Detail & Related papers (2022-07-09T18:42:14Z) - Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for
Autonomous Driving [3.8073142980733]
We propose jointly training 3D detection and 3D tracking from only monocular videos in an end-to-end manner.
Time3D achieves 21.4% AMOTA, 13.6% AMOTP on the nuScenes 3D tracking benchmark, surpassing all published competitors.
arXiv Detail & Related papers (2022-05-30T06:41:10Z) - Tracking People by Predicting 3D Appearance, Location & Pose [78.97070307547283]
We first lift people to 3D from a single frame in a robust way.
As we track a person, we collect 3D observations over time in a tracklet representation.
We use these models to predict the future state of the tracklet.
arXiv Detail & Related papers (2021-12-08T18:57:15Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS [13.191601826570786]
We present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views.
It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate.
We propose a new large-scale multi-human dataset with 12 to 28 camera views.
arXiv Detail & Related papers (2020-03-09T08:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.