ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning
- URL: http://arxiv.org/abs/2205.07439v1
- Date: Mon, 16 May 2022 04:24:22 GMT
- Title: ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning
- Authors: Yuxin Deng and Jiayi Ma
- Abstract summary: We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
- Score: 51.07496081296863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep-learning-based local feature extraction algorithms that combine
detection and description have made significant progress in visible image
matching. However, the end-to-end training of such frameworks is notoriously
unstable due to the lack of strong supervision of detection and the
inappropriate coupling between detection and description. The problem is
magnified in cross-modal scenarios, in which most methods heavily rely on the
pre-training. In this paper, we recouple independent constraints of detection
and description of multimodal feature learning with a mutual weighting
strategy, in which the detected probabilities of robust features are forced to
peak and repeat, while features with high detection scores are emphasized
during optimization. Different from previous works, those weights are detached
from back propagation so that the detected probability of indistinct features
would not be directly suppressed and the training would be more stable.
Moreover, we propose the Super Detector, a detector that possesses a large
receptive field and is equipped with learnable non-maximum suppression layers,
to fulfill the harsh terms of detection. Finally, we build a benchmark that
contains cross visible, infrared, near-infrared and synthetic aperture radar
image pairs for evaluating the performance of features in feature matching and
image registration tasks. Extensive experiments demonstrate that features
trained with the recoulped detection and description, named ReDFeat, surpass
previous state-of-the-arts in the benchmark, while the model can be readily
trained from scratch.
Related papers
- DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Infrared Small Target Detection Using Double-Weighted Multi-Granularity
Patch Tensor Model With Tensor-Train Decomposition [6.517559383143804]
This paper proposes a novel double-weighted multi-granularity infrared patch tensor (DWMGIPT) model.
The proposed algorithm is robust to noise and different scenes.
arXiv Detail & Related papers (2023-10-09T02:17:31Z) - MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential
Deepfake Detection [81.59191603867586]
Sequential deepfake detection aims to identify forged facial regions with the correct sequence for recovery.
The recovery of forged images requires knowledge of the manipulation model to implement inverse transformations.
We propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images.
arXiv Detail & Related papers (2023-07-06T02:32:08Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Adversarially-Aware Robust Object Detector [85.10894272034135]
We propose a Robust Detector (RobustDet) based on adversarially-aware convolution to disentangle gradients for model learning on clean and adversarial images.
Our model effectively disentangles gradients and significantly enhances the detection robustness with maintaining the detection ability on clean images.
arXiv Detail & Related papers (2022-07-13T13:59:59Z) - Decoupling Makes Weakly Supervised Local Feature Better [39.17900986173409]
We propose a decoupled describe-then-detect pipeline tailored for weakly supervised local feature learning.
Within our pipeline, the detection step is decoupled from the description step and postponed until discriminative and robust descriptors are learned.
In addition, we introduce a line-to-window search strategy to explicitly use the camera pose information for better descriptor learning.
arXiv Detail & Related papers (2022-01-08T16:51:02Z) - Unsupervised Change Detection in Hyperspectral Images using Feature
Fusion Deep Convolutional Autoencoders [15.978029004247617]
The proposed work aims to build a novel feature extraction system using a feature fusion deep convolutional autoencoder.
It is found that the proposed method clearly outperformed the state of the art methods in unsupervised change detection for all the datasets.
arXiv Detail & Related papers (2021-09-10T16:52:31Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.