Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects
- URL: http://arxiv.org/abs/2203.08472v1
- Date: Wed, 16 Mar 2022 08:53:00 GMT
- Title: Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects
- Authors: Chen Zhao, Yinlin Hu, Mathieu Salzmann
- Abstract summary: We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
- Score: 70.49392581592089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we tackle the task of estimating the 3D orientation of
previously-unseen objects from monocular images. This task contrasts with the
one considered by most existing deep learning methods which typically assume
that the testing objects have been observed during training. To handle the
unseen objects, we follow a retrieval-based strategy and prevent the network
from learning object-specific features by computing multi-scale local
similarities between the query image and synthetically-generated reference
images. We then introduce an adaptive fusion module that robustly aggregates
the local similarities into a global similarity score of pairwise images.
Furthermore, we speed up the retrieval process by developing a fast
clustering-based retrieval strategy. Our experiments on the LineMOD,
LineMOD-Occluded, and T-LESS datasets show that our method yields a
significantly better generalization to unseen objects than previous works.
Related papers
- Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking [37.186306646752975]
We propose a unified object-aware temporal learning framework for multi-view 3D detection and tracking tasks.
The proposed model achieves consistent performance gains over baselines of different designs.
arXiv Detail & Related papers (2024-07-03T16:10:19Z) - Shape Anchor Guided Holistic Indoor Scene Understanding [9.463220988312218]
We propose a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding.
AncLearn generates anchors that dynamically fit instance surfaces to (i) unmix noise and target-related features for offering reliable proposals at the detection stage.
We embed AncLearn into a reconstruction-from-detection learning system (AncRec) to generate high-quality semantic scene models.
arXiv Detail & Related papers (2023-09-20T08:30:20Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image [27.290232027686237]
We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters.
We parameterize the geometry and appearance of the object using a multi-scale global feature extractor.
We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.