Looking Beyond Corners: Contrastive Learning of Visual Representations
  for Keypoint Detection and Description Extraction
        - URL: http://arxiv.org/abs/2112.12002v1
- Date: Wed, 22 Dec 2021 16:27:11 GMT
- Title: Looking Beyond Corners: Contrastive Learning of Visual Representations
  for Keypoint Detection and Description Extraction
- Authors: Henrique Siqueira, Patrick Ruhkamp, Ibrahim Halfaoui, Markus Karmann,
  Onay Urfalioglu
- Abstract summary: Learnable keypoint detectors and descriptors are beginning to outperform classical hand-crafted feature extraction methods.
Recent studies on self-supervised learning of visual representations have driven the increasing performance of learnable models based on deep networks.
We propose the Correspondence Network (CorrNet) that learns to detect repeatable keypoints and to extract discriminative descriptions.
- Score: 1.5749416770494706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Learnable keypoint detectors and descriptors are beginning to outperform
classical hand-crafted feature extraction methods. Recent studies on
self-supervised learning of visual representations have driven the increasing
performance of learnable models based on deep networks. By leveraging
traditional data augmentations and homography transformations, these networks
learn to detect corners under adverse conditions such as extreme illumination
changes. However, their generalization capabilities are limited to corner-like
features detected a priori by classical methods or synthetically generated
data.
  In this paper, we propose the Correspondence Network (CorrNet) that learns to
detect repeatable keypoints and to extract discriminative descriptions via
unsupervised contrastive learning under spatial constraints. Our experiments
show that CorrNet is not only able to detect low-level features such as
corners, but also high-level features that represent similar objects present in
a pair of input images through our proposed joint guided backpropagation of
their latent space. Our approach obtains competitive results under viewpoint
changes and achieves state-of-the-art performance under illumination changes.
 
      
        Related papers
        - Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View   Perspective [54.605073936695575]
 Graph anomaly detection aims to identify unusual patterns in graph-based data, with wide applications in fields such as web security and financial fraud detection.<n>Existing methods rely on contrastive learning, assuming that a lower similarity between a node and its local subgraph indicates abnormality.<n>The presence of interfering edges invalidates this assumption, since it introduces disruptive noise that compromises the contrastive learning process.<n>We propose a Clean-View Enhanced Graph Anomaly Detection framework (CVGAD), which includes a multi-scale anomaly awareness module to identify key sources of interference in the contrastive learning process.
 arXiv  Detail & Related papers  (2025-05-23T15:05:56Z)
- "Principal Components" Enable A New Language of Images [79.45806370905775]
 We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.
Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
 arXiv  Detail & Related papers  (2025-03-11T17:59:41Z)
- From classical techniques to convolution-based models: A review of   object detection algorithms [0.562479170374811]
 Object detection is a fundamental task in computer vision and image understanding.
Traditional methods, which relied on handcrafted features and shallow models, struggled with complex visual data and showed limited performance.
Deep learning, especially Convolutional Neural Networks (CNNs), addressed these limitations by automatically learning rich, hierarchical features directly from data.
 arXiv  Detail & Related papers  (2024-12-06T18:32:54Z)
- Understanding and Improving Training-Free AI-Generated Image Detections   with Vision Foundation Models [68.90917438865078]
 Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
 arXiv  Detail & Related papers  (2024-11-28T13:04:45Z)
- Efficient Visualization of Neural Networks with Generative Models and   Adversarial Perturbations [0.0]
 This paper presents a novel approach for deep visualization via a generative network, offering an improvement over existing methods.
Our model simplifies the architecture by reducing the number of networks used, requiring only a generator and a discriminator.
Our model requires less prior training knowledge and uses a non-adversarial training process, where the discriminator acts as a guide.
 arXiv  Detail & Related papers  (2024-09-20T14:59:25Z)
- Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
 Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
 arXiv  Detail & Related papers  (2024-05-17T07:48:27Z)
- Automatic Discovery of Visual Circuits [66.99553804855931]
 We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept.
We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
 arXiv  Detail & Related papers  (2024-04-22T17:00:57Z)
- High-Discriminative Attribute Feature Learning for Generalized Zero-Shot   Learning [54.86882315023791]
 We propose an innovative approach called High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning (HDAFL)
HDAFL utilizes multiple convolutional kernels to automatically learn discriminative regions highly correlated with attributes in images.
We also introduce a Transformer-based attribute discrimination encoder to enhance the discriminative capability among attributes.
 arXiv  Detail & Related papers  (2024-04-07T13:17:47Z)
- Enhancing Deformable Local Features by Jointly Learning to Detect and
  Describe Keypoints [8.390939268280235]
 Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval.
We propose DALF, a novel deformation-aware network for jointly detecting and describing keypoints.
Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration.
 arXiv  Detail & Related papers  (2023-04-02T18:01:51Z)
- Active Visual Exploration Based on Attention-Map Entropy [13.064016215754163]
 We introduce a new technique called Attention-Map Entropy (AME) to determine the most informative observations.
AME does not require additional loss components, which simplifies the training.
We show that such simplified training significantly improves the performance of reconstruction, segmentation and classification on publicly available datasets.
 arXiv  Detail & Related papers  (2023-03-11T17:14:30Z)
- Learning Common Rationale to Improve Self-Supervised Representation for
  Fine-Grained Visual Recognition Problems [61.11799513362704]
 We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
 arXiv  Detail & Related papers  (2023-03-03T02:07:40Z)
- Adversarial Feature Augmentation and Normalization for Visual
  Recognition [109.6834687220478]
 Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
 arXiv  Detail & Related papers  (2021-03-22T20:36:34Z)
- RoRD: Rotation-Robust Descriptors and Orthographic Views for Local
  Feature Matching [32.10261486751993]
 We present a novel framework that combines learning of invariant descriptors through data augmentation and viewpoint projection.
We evaluate the effectiveness of the proposed approach on key tasks including pose estimation and visual place recognition.
 arXiv  Detail & Related papers  (2021-03-15T17:40:25Z)
- Revisiting Edge Detection in Convolutional Neural Networks [3.5281112495479245]
 We show that edges cannot be represented properly in the first convolutional layer of a neural network.
We propose edge-detection units and show that they reduce performance loss and generate qualitatively different representations.
 arXiv  Detail & Related papers  (2020-12-25T13:53:04Z)
- Learning What Makes a Difference from Counterfactual Examples and
  Gradient Supervision [57.14468881854616]
 We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
 arXiv  Detail & Related papers  (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.