Coded Residual Transform for Generalizable Deep Metric Learning
- URL: http://arxiv.org/abs/2210.04180v1
- Date: Sun, 9 Oct 2022 06:17:31 GMT
- Title: Coded Residual Transform for Generalizable Deep Metric Learning
- Authors: Shichao Kan, Yixiong Liang, Min Li, Yigang Cen, Jianxin Wang, Zhihai
He
- Abstract summary: We introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability.
CRT represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes.
Our experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins.
- Score: 34.100840501900706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental challenge in deep metric learning is the generalization
capability of the feature embedding network model since the embedding network
learned on training classes need to be evaluated on new test classes. To
address this challenge, in this paper, we introduce a new method called coded
residual transform (CRT) for deep metric learning to significantly improve its
generalization capability. Specifically, we learn a set of diversified
prototype features, project the feature map onto each prototype, and then
encode its features using their projection residuals weighted by their
correlation coefficients with each prototype. The proposed CRT method has the
following two unique characteristics. First, it represents and encodes the
feature map from a set of complimentary perspectives based on projections onto
diversified prototypes. Second, unlike existing transformer-based feature
representation approaches which encode the original values of features based on
global correlation analysis, the proposed coded residual transform encodes the
relative differences between the original features and their projected
prototypes. Embedding space density and spectral decay analysis show that this
multi-perspective projection onto diversified prototypes and coded residual
representation are able to achieve significantly improved generalization
capability in metric learning. Finally, to further enhance the generalization
performance, we propose to enforce the consistency on their feature similarity
matrices between coded residual transforms with different sizes of projection
prototypes and embedding dimensions. Our extensive experimental results and
ablation studies demonstrate that the proposed CRT method outperform the
state-of-the-art deep metric learning methods by large margins and improving
upon the current best method by up to 4.28% on the CUB dataset.
Related papers
- cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process [23.266122629592807]
Multiple instance learning (MIL) has been extensively applied to whole slide histoparametric image (WSI) analysis.
The existing aggregation strategy in MIL, which primarily relies on the first-order distance between instances, fails to accurately approximate the true feature distribution of each instance.
We propose a new Bayesian nonparametric framework for multiple instance learning, which adopts a cascade of Dirichlet processes (cDP) to incorporate the instance-to-bag characteristic of the WSIs.
arXiv Detail & Related papers (2024-07-16T07:28:39Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss [28.529476019629097]
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks.
We propose methods to engineer the geometry of learnt feature embeddings by modifying the contrastive loss.
arXiv Detail & Related papers (2023-10-02T04:23:17Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space.
We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z) - Learned Multi-layer Residual Sparsifying Transform Model for Low-dose CT
Reconstruction [11.470070927586017]
Sparsifying transform learning involves highly efficient sparse coding and operator update steps.
We propose a Multi-layer Residual Sparsifying Transform (MRST) learning model wherein the transform domain residuals are jointly sparsified over layers.
arXiv Detail & Related papers (2020-05-08T02:36:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.