HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning
- URL: http://arxiv.org/abs/2005.05777v2
- Date: Thu, 26 Nov 2020 09:14:34 GMT
- Title: HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning
- Authors: Axel Barroso-Laguna, Yannick Verdie, Benjamin Busam, Krystian
Mikolajczyk
- Abstract summary: Local feature extraction remains an active research area due to the advances in fields such as SLAM, 3D reconstructions, or AR applications.
We propose a method that treats both extractions independently and focuses on their interaction in the learning process.
We show improvements over the state of the art in terms of image matching on HPatches and 3D reconstruction quality while keeping on par on camera localisation tasks.
- Score: 24.13425816781179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local feature extraction remains an active research area due to the advances
in fields such as SLAM, 3D reconstructions, or AR applications. The success in
these applications relies on the performance of the feature detector and
descriptor. While the detector-descriptor interaction of most methods is based
on unifying in single network detections and descriptors, we propose a method
that treats both extractions independently and focuses on their interaction in
the learning process rather than by parameter sharing. We formulate the
classical hard-mining triplet loss as a new detector optimisation term to
refine candidate positions based on the descriptor map. We propose a dense
descriptor that uses a multi-scale approach and a hybrid combination of
hand-crafted and learned features to obtain rotation and scale robustness by
design. We evaluate our method extensively on different benchmarks and show
improvements over the state of the art in terms of image matching on HPatches
and 3D reconstruction quality while keeping on par on camera localisation
tasks.
Related papers
- GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving [9.023864430027333]
multimodal place recognition has gained increasing attention due to their ability to overcome weaknesses of uni sensor systems.
We propose a 3D Gaussian-based multimodal place recognition neural network dubbed GSPR.
arXiv Detail & Related papers (2024-10-01T00:43:45Z) - Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks [9.388897214344572]
Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision.
Traditionally, parametric techniques have been employed for this task.
Recent advancements have seen a shift towards learning-based methods.
arXiv Detail & Related papers (2024-08-29T11:16:34Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Enhancing Deformable Local Features by Jointly Learning to Detect and
Describe Keypoints [8.390939268280235]
Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval.
We propose DALF, a novel deformation-aware network for jointly detecting and describing keypoints.
Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration.
arXiv Detail & Related papers (2023-04-02T18:01:51Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Improving 3D Object Detection with Channel-wise Transformer [58.668922561622466]
We propose a two-stage 3D object detection framework (CT3D) with minimal hand-crafted design.
CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation.
It achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark.
arXiv Detail & Related papers (2021-08-23T02:03:40Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - SEKD: Self-Evolving Keypoint Detection and Description [42.114065439674036]
We propose a self-supervised framework to learn an advanced local feature model from unlabeled natural images.
We benchmark the proposed method on homography estimation, relative pose estimation, and structure-from-motion tasks.
We will release our code along with the trained model publicly.
arXiv Detail & Related papers (2020-06-09T06:56:50Z) - D2D: Keypoint Extraction with Describe to Detect Approach [48.0325745125635]
We present a novel approach that exploits the information within the descriptor space to propose keypoint locations.
We propose an approach that inverts this process by first describing and then detecting the keypoint locations.
arXiv Detail & Related papers (2020-05-27T19:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.