Learning-based Relational Object Matching Across Views
- URL: http://arxiv.org/abs/2305.02398v1
- Date: Wed, 3 May 2023 19:36:51 GMT
- Title: Learning-based Relational Object Matching Across Views
- Authors: Cathrin Elich, Iro Armeni, Martin R. Oswald, Marc Pollefeys, Joerg
Stueckler
- Abstract summary: We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
- Score: 63.63338392484501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent robots require object-level scene understanding to reason about
possible tasks and interactions with the environment. Moreover, many perception
tasks such as scene reconstruction, image retrieval, or place recognition can
benefit from reasoning on the level of objects. While keypoint-based matching
can yield strong results for finding correspondences for images with small to
medium view point changes, for large view point changes, matching semantically
on the object-level becomes advantageous. In this paper, we propose a
learning-based approach which combines local keypoints with novel object-level
features for matching object detections between RGB images. We train our
object-level matching features based on appearance and inter-frame and
cross-frame spatial relations between objects in an associative graph neural
network. We demonstrate our approach in a large variety of views on
realistically rendered synthetic images. Our approach compares favorably to
previous state-of-the-art object-level matching approaches and achieves
improved performance over a pure keypoint-based approach for large view-point
changes.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image [27.290232027686237]
We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters.
We parameterize the geometry and appearance of the object using a multi-scale global feature extractor.
We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Semantically Grounded Object Matching for Robust Robotic Scene
Rearrangement [21.736603698556042]
We present a novel approach to object matching that uses a large pre-trained vision-language model to match objects in a cross-instance setting.
We demonstrate that this provides considerably improved matching performance in cross-instance settings.
arXiv Detail & Related papers (2021-11-15T18:39:43Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.