Stochastic Modeling for Learnable Human Pose Triangulation
- URL: http://arxiv.org/abs/2110.00280v1
- Date: Fri, 1 Oct 2021 09:26:25 GMT
- Title: Stochastic Modeling for Learnable Human Pose Triangulation
- Authors: Kristijan Bartol, David Bojani\'c, Tomislav Petkovi\'c, Tomislav
Pribani\'c
- Abstract summary: We propose a modeling framework for 3D human pose triangulation and evaluate its performance across different datasets and spatial camera arrangements.
The proposed pose triangulation model successfully generalizes to different camera arrangements and between two public datasets.
- Score: 0.7646713951724009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a stochastic modeling framework for 3D human pose triangulation
and evaluate its performance across different datasets and spatial camera
arrangements. The common approach to 3D pose estimation is to first detect 2D
keypoints in images and then apply the triangulation from multiple views.
However, the majority of existing triangulation models are limited to a single
dataset, i.e. camera arrangement and their number. Moreover, they require known
camera parameters. The proposed stochastic pose triangulation model
successfully generalizes to different camera arrangements and between two
public datasets. In each step, we generate a set of 3D pose hypotheses obtained
by triangulation from a random subset of views. The hypotheses are evaluated by
a neural network and the expectation of the triangulation error is minimized.
The key novelty is that the network learns to evaluate the poses without taking
into account the spatial camera arrangement, thus improving generalization.
Additionally, we demonstrate that the proposed stochastic framework can also be
used for fundamental matrix estimation, showing promising results towards
relative camera pose estimation from noisy keypoint correspondences.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - 6D Object Pose Estimation from Approximate 3D Models for Orbital
Robotics [19.64111218032901]
We present a novel technique to estimate the 6D pose of objects from single images.
We employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel.
Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
arXiv Detail & Related papers (2023-03-23T13:18:05Z) - Learning Implicit Probability Distribution Functions for Symmetric
Orientation Estimation from RGB Images Without Pose Labels [23.01797447932351]
We propose an automatic pose labeling scheme for RGB-D images.
We train an ImplicitPDF model to estimate the likelihood of an orientation hypothesis given an RGB image.
An efficient hierarchical sampling of the SO(3) manifold enables tractable generation of the complete set of symmetries.
arXiv Detail & Related papers (2022-11-21T12:07:40Z) - Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification.
We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Beyond Weak Perspective for Monocular 3D Human Pose Estimation [6.883305568568084]
We consider the task of 3D joints location and orientation prediction from a monocular video.
We first infer 2D joints locations with an off-the-shelf pose estimation algorithm.
We then adhere to the SMPLify algorithm which receives those initial parameters.
arXiv Detail & Related papers (2020-09-14T16:23:14Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.