Category-Level Pose Retrieval with Contrastive Features Learnt with
Occlusion Augmentation
- URL: http://arxiv.org/abs/2208.06195v2
- Date: Tue, 16 Aug 2022 13:35:56 GMT
- Title: Category-Level Pose Retrieval with Contrastive Features Learnt with
Occlusion Augmentation
- Authors: Georgios Kouros and Shubham Shrivastava and C\'edric Picron and
Sushruth Nagesh and Punarjay Chakravarty and Tinne Tuytelaars
- Abstract summary: We propose an approach to category-level pose estimation using a contrastive loss with a dynamic margin and a continuous pose-label space.
Our approach achieves state-of-the-art performance on PASCAL3D and OccludedPASCAL3D, as well as high-quality results on KITTI3D.
- Score: 31.73423009695285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pose estimation is usually tackled as either a bin classification problem or
as a regression problem. In both cases, the idea is to directly predict the
pose of an object. This is a non-trivial task because of appearance variations
of similar poses and similarities between different poses. Instead, we follow
the key idea that it is easier to compare two poses than to estimate them.
Render-and-compare approaches have been employed to that end, however, they
tend to be unstable, computationally expensive, and slow for real-time
applications. We propose doing category-level pose estimation by learning an
alignment metric using a contrastive loss with a dynamic margin and a
continuous pose-label space. For efficient inference, we use a simple real-time
image retrieval scheme with a reference set of renderings projected to an
embedding space. To achieve robustness to real-world conditions, we employ
synthetic occlusions, bounding box perturbations, and appearance augmentations.
Our approach achieves state-of-the-art performance on PASCAL3D and
OccludedPASCAL3D, as well as high-quality results on KITTI3D.
Related papers
- DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching [14.737266480464156]
We present a method named iComMa to address the 6D camera pose estimation problem in computer vision.
We propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS)
arXiv Detail & Related papers (2023-12-14T15:31:33Z) - ContraNeRF: 3D-Aware Generative Model via Contrastive Learning with
Unsupervised Implicit Pose Embedding [40.36882490080341]
We propose a novel 3D-aware GAN optimization technique through contrastive learning with implicit pose embeddings.
We make the discriminator estimate a high-dimensional implicit pose embedding from a given image and perform contrastive learning on the pose embedding.
The proposed approach can be employed for the dataset, where the canonical camera pose is ill-defined because it does not look up or estimate camera poses.
arXiv Detail & Related papers (2023-04-27T07:53:13Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - Wide-Depth-Range 6D Object Pose Estimation in Space [124.94794113264194]
6D pose estimation in space poses unique challenges that are not commonly encountered in the terrestrial setting.
One of the most striking differences is the lack of atmospheric scattering, allowing objects to be visible from a great distance.
We propose a single-stage hierarchical end-to-end trainable network that is more robust to scale variations.
arXiv Detail & Related papers (2021-04-01T08:39:26Z) - Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection.
Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z) - Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z) - Few-shot Action Recognition with Implicit Temporal Alignment and Pair
Similarity Optimization [37.010005936995334]
Few-shot learning aims to recognize instances from novel classes with few labeled samples.
Video-based few-shot action recognition has not been explored well and remains challenging.
This paper presents 1) a specific setting to evaluate the performance of few-shot action recognition algorithms; 2) an implicit sequence-alignment algorithm for better video-level similarity comparison; 3) an advanced loss for few-shot learning to optimize pair similarity with limited data.
arXiv Detail & Related papers (2020-10-13T07:56:06Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.