Improving the Modality Representation with Multi-View Contrastive
Learning for Multimodal Sentiment Analysis
- URL: http://arxiv.org/abs/2210.15824v1
- Date: Fri, 28 Oct 2022 01:25:16 GMT
- Title: Improving the Modality Representation with Multi-View Contrastive
Learning for Multimodal Sentiment Analysis
- Authors: Peipei Liu, Xin Zheng, Hong Li, Jie Liu, Yimo Ren, Hongsong Zhu, Limin
Sun
- Abstract summary: This study investigates the improvement approaches of modality representation with contrastive learning.
We devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives.
We conduct experiments on three open datasets, and results show the advance of our model.
- Score: 15.623293264871181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modality representation learning is an important problem for multimodal
sentiment analysis (MSA), since the highly distinguishable representations can
contribute to improving the analysis effect. Previous works of MSA have usually
focused on multimodal fusion strategies, and the deep study of modal
representation learning was given less attention. Recently, contrastive
learning has been confirmed effective at endowing the learned representation
with stronger discriminate ability. Inspired by this, we explore the
improvement approaches of modality representation with contrastive learning in
this study. To this end, we devise a three-stages framework with multi-view
contrastive learning to refine representations for the specific objectives. At
the first stage, for the improvement of unimodal representations, we employ the
supervised contrastive learning to pull samples within the same class together
while the other samples are pushed apart. At the second stage, a
self-supervised contrastive learning is designed for the improvement of the
distilled unimodal representations after cross-modal interaction. At last, we
leverage again the supervised contrastive learning to enhance the fused
multimodal representation. After all the contrast trainings, we next achieve
the classification task based on frozen representations. We conduct experiments
on three open datasets, and results show the advance of our model.
Related papers
- On the Comparison between Multi-modal and Single-modal Contrastive Learning [50.74988548106031]
We introduce a theoretical foundation for understanding the differences between multi-modal and single-modal contrastive learning.
We identify the critical factor, which is the signal-to-noise ratio (SNR), that impacts the generalizability in downstream tasks of both multi-modal and single-modal contrastive learning.
Our analysis provides a unified framework that can characterize the optimization and generalization of both single-modal and multi-modal contrastive learning.
arXiv Detail & Related papers (2024-11-05T06:21:17Z) - Revealing Multimodal Contrastive Representation Learning through Latent
Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Constrained Multiview Representation for Self-supervised Contrastive
Learning [4.817827522417457]
We introduce a novel approach predicated on representation distance-based mutual information (MI) for measuring the significance of different views.
We harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information.
arXiv Detail & Related papers (2024-02-05T19:09:33Z) - Improving Multimodal Sentiment Analysis: Supervised Angular Margin-based
Contrastive Learning for Enhanced Fusion Representation [10.44888349041063]
We introduce a framework called Supervised Angular-based Contrastive Learning for Multimodal Sentiment Analysis.
This framework aims to enhance discrimination and generalizability of the multimodal representation and overcome biases in the fusion vector's modality.
arXiv Detail & Related papers (2023-12-04T02:58:19Z) - I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal
Mutual Distillation [147.2183428328396]
We introduce a general Inter- and Intra-modal Mutual Distillation (I$2$MD) framework.
In I$2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process.
To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy.
arXiv Detail & Related papers (2023-10-24T07:22:17Z) - Identifiability Results for Multimodal Contrastive Learning [72.15237484019174]
We show that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously.
Our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
arXiv Detail & Related papers (2023-03-16T09:14:26Z) - Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal
Prediction for Multimodal Sentiment Analysis [19.07020276666615]
We propose a novel framework named MultiModal Contrastive Learning (MMCL) for multimodal representation to capture intra- and inter-modality dynamics simultaneously.
We also design two contrastive learning tasks, instance- and sentiment-based contrastive learning, to promote the process of prediction and learn more interactive information related to sentiment.
arXiv Detail & Related papers (2022-10-26T08:24:15Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - A Broad Study on the Transferability of Visual Representations with
Contrastive Learning [15.667240680328922]
We study the transferability of learned representations of contrastive approaches for linear evaluation, full-network transfer, and few-shot recognition.
The results show that the contrastive approaches learn representations that are easily transferable to a different downstream task.
Our analysis reveals that the representations learned from the contrastive approaches contain more low/mid-level semantics than cross-entropy models.
arXiv Detail & Related papers (2021-03-24T22:55:04Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.