Learning Geodesic-Aware Local Features from RGB-D Images
- URL: http://arxiv.org/abs/2203.12016v1
- Date: Tue, 22 Mar 2022 19:52:49 GMT
- Title: Learning Geodesic-Aware Local Features from RGB-D Images
- Authors: Guilherme Potje, Renato Martins, Felipe Cadar, Erickson R. Nascimento
- Abstract summary: We propose a new approach to compute descriptors from RGB-D images that are invariant to non-rigid deformations.
Our proposed description strategies are grounded on the key idea of learning feature representations on undistorted local image patches.
In different experiments using real and publicly available RGB-D data benchmarks, they consistently outperforms state-of-the-art handcrafted and learning-based image and RGB-D descriptors.
- Score: 8.115075181267109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the existing handcrafted and learning-based local descriptors are
still at best approximately invariant to affine image transformations, often
disregarding deformable surfaces. In this paper, we take one step further by
proposing a new approach to compute descriptors from RGB-D images (where RGB
refers to the pixel color brightness and D stands for depth information) that
are invariant to isometric non-rigid deformations, as well as to scale changes
and rotation. Our proposed description strategies are grounded on the key idea
of learning feature representations on undistorted local image patches using
surface geodesics. We design two complementary local descriptors strategies to
compute geodesic-aware features efficiently: one efficient binary descriptor
based on handcrafted binary tests (named GeoBit), and one learning-based
descriptor (GeoPatch) with convolutional neural networks (CNNs) to compute
features. In different experiments using real and publicly available RGB-D data
benchmarks, they consistently outperforms state-of-the-art handcrafted and
learning-based image and RGB-D descriptors in matching scores, as well as in
object retrieval and non-rigid surface tracking experiments, with comparable
processing times. We also provide to the community a new dataset with accurate
matching annotations of RGB-D images of different objects (shirts, cloths,
paintings, bags), subjected to strong non-rigid deformations, for evaluation
benchmark of deformable surface correspondence algorithms.
Related papers
- Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images [57.71600854525037]
We propose a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images.
MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects.
arXiv Detail & Related papers (2024-03-03T14:01:03Z) - Clothes Grasping and Unfolding Based on RGB-D Semantic Segmentation [21.950751953721817]
We propose a novel Bi-directional Fractal Cross Fusion Network (BiFCNet) for semantic segmentation.
We use RGB images with rich color features as input to our network in which the Fractal Cross Fusion module fuses RGB and depth data.
To reduce the cost of real data collection, we propose a data augmentation method based on an adversarial strategy.
arXiv Detail & Related papers (2023-05-05T03:21:55Z) - Depth-Adapted CNNs for RGB-D Semantic Segmentation [2.341385717236931]
We propose a novel framework to incorporate the depth information in the RGB convolutional neural network (CNN)
Specifically, our Z-ACN generates a 2D depth-adapted offset which is fully constrained by low-level features to guide the feature extraction on RGB images.
With the generated offset, we introduce two intuitive and effective operations to replace basic CNN operators.
arXiv Detail & Related papers (2022-06-08T14:59:40Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - Extracting Deformation-Aware Local Features by Learning to Deform [3.364554138758565]
We present a new approach to compute features from still images that are robust to non-rigid deformations.
We train the model architecture end-to-end by applying non-rigid deformations to objects in a simulated environment.
Experiments show that our method outperforms state-of-the-art handcrafted, learning-based image, and RGB-D descriptors in different datasets.
arXiv Detail & Related papers (2021-11-20T15:46:33Z) - Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB
Images in the Wild [48.44194221801609]
We propose a new lightweight and end-to-end learning-based framework to tackle this challenge.
We progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective camera spectral response function estimation.
Our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings.
arXiv Detail & Related papers (2021-08-15T05:19:44Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - Category-Level 3D Non-Rigid Registration from Single-View RGB Images [28.874008960264202]
We propose a novel approach to solve the 3D non-rigid registration problem from RGB images using CNNs.
Our objective is to find a deformation field that warps a given 3D canonical model into a novel instance observed by a single-view RGB image.
arXiv Detail & Related papers (2020-08-17T10:35:19Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional
Variational Autoencoders [81.5490760424213]
We propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose probabilistic RGB-D saliency detection network.
arXiv Detail & Related papers (2020-04-13T04:12:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.