Cross-modal Center Loss
- URL: http://arxiv.org/abs/2008.03561v1
- Date: Sat, 8 Aug 2020 17:26:35 GMT
- Title: Cross-modal Center Loss
- Authors: Longlong Jing and Elahe Vahdani and Jiaxing Tan and Yingli Tian
- Abstract summary: Cross-modal retrieval aims to learn discriminative and modal-invariant features for data from different modalities.
We propose an approach to jointly train the components of cross-modal retrieval framework with metadata.
The proposed framework significantly outperforms the state-of-the-art methods on the ModelNet40 dataset.
- Score: 28.509817129759014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal retrieval aims to learn discriminative and modal-invariant
features for data from different modalities. Unlike the existing methods which
usually learn from the features extracted by offline networks, in this paper,
we propose an approach to jointly train the components of cross-modal retrieval
framework with metadata, and enable the network to find optimal features. The
proposed end-to-end framework is updated with three loss functions: 1) a novel
cross-modal center loss to eliminate cross-modal discrepancy, 2) cross-entropy
loss to maximize inter-class variations, and 3) mean-square-error loss to
reduce modality variations. In particular, our proposed cross-modal center loss
minimizes the distances of features from objects belonging to the same class
across all modalities. Extensive experiments have been conducted on the
retrieval tasks across multi-modalities, including 2D image, 3D point cloud,
and mesh data. The proposed framework significantly outperforms the
state-of-the-art methods on the ModelNet40 dataset.
Related papers
- Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal
Data [10.908771426089512]
Building cross-modal applications is challenging due to limited paired multi-modal data.
Recent works have shown that leveraging a pre-trained multi-modal contrastive representation space enables cross-modal tasks to be learned from uni-modal data.
We introduce a three-step method, $C3$ (Connect, Collapse, Corrupt), to bridge the modality gap, enhancing the interchangeability of embeddings.
arXiv Detail & Related papers (2024-01-16T18:52:27Z) - Modality Unifying Network for Visible-Infrared Person Re-Identification [24.186989535051623]
Visible-infrared person re-identification (VI-ReID) is a challenging task due to large cross-modality discrepancies and intra-class variations.
Existing methods mainly focus on learning modality-shared representations by embedding different modalities into the same feature space.
We propose a novel Modality Unifying Network (MUN) to explore a robust auxiliary modality for VI-ReID.
arXiv Detail & Related papers (2023-09-12T14:22:22Z) - Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal
Retriveal [52.41252219453429]
Existing methods treat all instances equally, applying the same penalty strength to instances with varying degrees of difficulty.
This can result in ambiguous convergence or local optima, severely compromising the separability of the feature space.
We propose an Instance-Variant loss to assign different penalty strengths to different instances, improving the space separability.
arXiv Detail & Related papers (2023-05-07T10:12:14Z) - GARNet: Global-Aware Multi-View 3D Reconstruction Network and the
Cost-Performance Tradeoff [10.8606881536924]
We propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference.
In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall.
Experiments on ShapeNet verify that our method outperforms existing SOTA methods.
arXiv Detail & Related papers (2022-11-04T07:45:19Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared
Person Re-Identification [35.97494894205023]
RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality.
Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space.
We present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space.
arXiv Detail & Related papers (2021-10-21T16:45:23Z) - Exploring Modality-shared Appearance Features and Modality-invariant
Relation Features for Cross-modality Person Re-Identification [72.95858515157603]
Cross-modality person re-identification works rely on discriminative modality-shared features.
Despite some initial success, such modality-shared appearance features cannot capture enough modality-invariant information.
A novel cross-modality quadruplet loss is proposed to further reduce the cross-modality variations.
arXiv Detail & Related papers (2021-04-23T11:14:07Z) - InverseForm: A Loss Function for Structured Boundary-Aware Segmentation [80.39674800972182]
We present a novel boundary-aware loss term for semantic segmentation using an inverse-transformation network.
This plug-in loss term complements the cross-entropy loss in capturing boundary transformations.
We analyze the quantitative and qualitative effects of our loss function on three indoor and outdoor segmentation benchmarks.
arXiv Detail & Related papers (2021-04-06T18:52:45Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Universal Weighting Metric Learning for Cross-Modal Matching [79.32133554506122]
Cross-modal matching has been a highlighted research topic in both vision and language areas.
We propose a simple and interpretable universal weighting framework for cross-modal matching.
arXiv Detail & Related papers (2020-10-07T13:16:45Z) - Parameter Sharing Exploration and Hetero-Center based Triplet Loss for
Visible-Thermal Person Re-Identification [17.402673438396345]
This paper focuses on the visible-thermal cross-modality person re-identification (VT Re-ID) task.
Our proposed method distinctly outperforms the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-08-14T07:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.