Towards Interpretable Deep Metric Learning with Structural Matching
- URL: http://arxiv.org/abs/2108.05889v1
- Date: Thu, 12 Aug 2021 17:59:09 GMT
- Title: Towards Interpretable Deep Metric Learning with Structural Matching
- Authors: Wenliang Zhao, Yongming Rao, Ziyi Wang, Jiwen Lu, Jie Zhou
- Abstract summary: We present a deep interpretable metric learning (DIML) method for more transparent embedding learning.
Our method is model-agnostic, which can be applied to off-the-shelf backbone networks and metric learning methods.
We evaluate our method on three major benchmarks of deep metric learning including CUB200-2011, Cars196, and Stanford Online Products.
- Score: 86.16700459215383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do the neural networks distinguish two images? It is of critical
importance to understand the matching mechanism of deep models for developing
reliable intelligent systems for many risky visual applications such as
surveillance and access control. However, most existing deep metric learning
methods match the images by comparing feature vectors, which ignores the
spatial structure of images and thus lacks interpretability. In this paper, we
present a deep interpretable metric learning (DIML) method for more transparent
embedding learning. Unlike conventional metric learning methods based on
feature vector comparison, we propose a structural matching strategy that
explicitly aligns the spatial embeddings by computing an optimal matching flow
between feature maps of the two images. Our method enables deep models to learn
metrics in a more human-friendly way, where the similarity of two images can be
decomposed to several part-wise similarities and their contributions to the
overall similarity. Our method is model-agnostic, which can be applied to
off-the-shelf backbone networks and metric learning methods. We evaluate our
method on three major benchmarks of deep metric learning including CUB200-2011,
Cars196, and Stanford Online Products, and achieve substantial improvements
over popular metric learning methods with better interpretability. Code is
available at https://github.com/wl-zhao/DIML
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Deep Learning Meets Satellite Images -- An Evaluation on Handcrafted and Learning-based Features for Multi-date Satellite Stereo Images [18.253174056710684]
Off-track (or multi-date) satellite stereo images can challenge the performance of feature matching.
We compare the performance of different features, also known as feature extraction and matching methods, applied to satellite imagery.
arXiv Detail & Related papers (2024-09-04T15:43:10Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Training Methods of Multi-label Prediction Classifiers for Hyperspectral
Remote Sensing Images [0.0]
We propose a multi-label, patch-level classification method for hyperspectral remote sensing images.
We use patches of reduced spatial dimension and a complete spectral depth extracted from the remote sensing images.
arXiv Detail & Related papers (2023-01-17T13:30:03Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - BSNet: Bi-Similarity Network for Few-shot Fine-grained Image
Classification [35.50808687239441]
We propose a so-called textitBi-Similarity Network (textitBSNet)
The bi-similarity module learns feature maps according to two similarity measures of diverse characteristics.
In this way, the model is enabled to learn more discriminative and less similarity-biased features from few shots of fine-grained images.
arXiv Detail & Related papers (2020-11-29T08:38:17Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.