ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis
- URL: http://arxiv.org/abs/2306.15796v1
- Date: Tue, 27 Jun 2023 20:51:03 GMT
- Title: ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis
- Authors: Yakun Yu, Mingjun Zhao, Shi-ang Qi, Feiran Sun, Baoxun Wang, Weidong
Guo, Xiaoli Wang, Lei Yang, Di Niu
- Abstract summary: We propose Contrastive Knowledge Injection (ConKI) for multimodal sentiment analysis.
ConKI learns specific-knowledge representations for each modality together with general knowledge representations via knowledge injection.
Experiments on three popular multimodal sentiment analysis benchmarks show that ConKI outperforms all prior methods on a variety of performance metrics.
- Score: 19.53507553138143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Sentiment Analysis leverages multimodal signals to detect the
sentiment of a speaker. Previous approaches concentrate on performing
multimodal fusion and representation learning based on general knowledge
obtained from pretrained models, which neglects the effect of domain-specific
knowledge. In this paper, we propose Contrastive Knowledge Injection (ConKI)
for multimodal sentiment analysis, where specific-knowledge representations for
each modality can be learned together with general knowledge representations
via knowledge injection based on an adapter architecture. In addition, ConKI
uses a hierarchical contrastive learning procedure performed between knowledge
types within every single modality, across modalities within each sample, and
across samples to facilitate the effective learning of the proposed
representations, hence improving multimodal sentiment predictions. The
experiments on three popular multimodal sentiment analysis benchmarks show that
ConKI outperforms all prior methods on a variety of performance metrics.
Related papers
- On the Comparison between Multi-modal and Single-modal Contrastive Learning [50.74988548106031]
We introduce a theoretical foundation for understanding the differences between multi-modal and single-modal contrastive learning.
We identify the critical factor, which is the signal-to-noise ratio (SNR), that impacts the generalizability in downstream tasks of both multi-modal and single-modal contrastive learning.
Our analysis provides a unified framework that can characterize the optimization and generalization of both single-modal and multi-modal contrastive learning.
arXiv Detail & Related papers (2024-11-05T06:21:17Z) - Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis [4.344546814121446]
We propose a Knowledge-Guided Dynamic Modality Attention Fusion Framework (KuDA) for multimodal sentiment analysis.
KuDA uses sentiment knowledge to guide the model dynamically selecting the dominant modality and adjusting the contributions of each modality.
Experiments on four MSA benchmark datasets indicate that KuDA achieves state-of-the-art performance and is able to adapt to different scenarios of dominant modality.
arXiv Detail & Related papers (2024-10-06T14:10:28Z) - Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification [32.80872775195836]
Generalizable vehicle re-identification (ReID) aims to enable the well-trained model in diverse source domains to broadly adapt to unknown target domains.
It still faces the challenges of domain shift problem and has difficulty accurately generalizing to unknown target domains.
This paper proposes the two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method.
arXiv Detail & Related papers (2024-07-10T04:06:39Z) - Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning [51.80447197290866]
Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning.
Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies.
We introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal entity representations.
arXiv Detail & Related papers (2024-05-27T06:36:17Z) - Revealing Multimodal Contrastive Representation Learning through Latent
Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach [15.54426275761234]
Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues.
Most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario.
We propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities.
arXiv Detail & Related papers (2023-12-28T06:47:18Z) - Improving Multimodal Sentiment Analysis: Supervised Angular Margin-based
Contrastive Learning for Enhanced Fusion Representation [10.44888349041063]
We introduce a framework called Supervised Angular-based Contrastive Learning for Multimodal Sentiment Analysis.
This framework aims to enhance discrimination and generalizability of the multimodal representation and overcome biases in the fusion vector's modality.
arXiv Detail & Related papers (2023-12-04T02:58:19Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - A Discriminative Vectorial Framework for Multi-modal Feature
Representation [19.158947368297557]
A discriminative framework is proposed for multimodal feature representation in knowledge discovery.
It employs multi-modal hashing (MH) and discriminative correlation (DCM) analysis.
The framework is superior to state-of-the-art statistical machine learning (S.M.) and deep network neural (DNN) algorithms.
arXiv Detail & Related papers (2021-03-09T18:18:06Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.