Multi-view object pose estimation from correspondence distributions and
epipolar geometry
- URL: http://arxiv.org/abs/2210.00924v2
- Date: Thu, 23 Mar 2023 13:02:42 GMT
- Title: Multi-view object pose estimation from correspondence distributions and
epipolar geometry
- Authors: Rasmus Laurvig Haugaard, Thorbj{\o}rn Mosekj{\ae}r Iversen
- Abstract summary: We present a multi-view pose estimation method which aggregates learned 2D-3D distributions from multiple views for both the initial estimate and optional refinement.
Our method reduces pose estimation errors by 80-91% compared to the best single-view method, and we present state-of-the-art results on T-LESS with four views, even compared with methods using five and eight views.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many automation tasks involving manipulation of rigid objects, the poses
of the objects must be acquired. Vision-based pose estimation using a single
RGB or RGB-D sensor is especially popular due to its broad applicability.
However, single-view pose estimation is inherently limited by depth ambiguity
and ambiguities imposed by various phenomena like occlusion, self-occlusion,
reflections, etc. Aggregation of information from multiple views can
potentially resolve these ambiguities, but the current state-of-the-art
multi-view pose estimation method only uses multiple views to aggregate
single-view pose estimates, and thus rely on obtaining good single-view
estimates. We present a multi-view pose estimation method which aggregates
learned 2D-3D distributions from multiple views for both the initial estimate
and optional refinement. Our method performs probabilistic sampling of 3D-3D
correspondences under epipolar constraints using learned 2D-3D correspondence
distributions which are implicitly trained to respect visual ambiguities such
as symmetry. Evaluation on the T-LESS dataset shows that our method reduces
pose estimation errors by 80-91% compared to the best single-view method, and
we present state-of-the-art results on T-LESS with four views, even compared
with methods using five and eight views.
Related papers
- Correspondence-Free Pose Estimation with Patterns: A Unified Approach for Multi-Dimensional Vision [10.274601503572715]
A new correspondence-free pose estimation method and its practical algorithms are proposed.
By taking the considered point sets as patterns, feature functions used to describe these patterns are introduced to establish a sufficient number of equations for optimization.
The proposed method is applicable to nonlinear transformations such as perspective projection and can cover various pose estimations from 3D-to-3D points, 3D-to-2D points, and 2D-to-2D points.
arXiv Detail & Related papers (2025-02-26T14:38:44Z) - SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network.
It can perform inference at 32 FPS without requiring inputs other than the RGB image.
It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z) - BOP-Distrib: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities [0.7499722271664147]
6D pose estimation aims at determining the pose of the object that best explains the camera observation.
Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries.
We propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the visibility of the object surface in the image to correctly determine the visual ambiguities.
arXiv Detail & Related papers (2024-08-30T13:52:26Z) - DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation [24.770767430749288]
We propose a 3 stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector)
We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose.
DPODv2 achieves excellent results on all of them while still remaining fast and scalable independent of the used data modality and the type of training data.
arXiv Detail & Related papers (2022-07-06T16:48:56Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.