Tailor Versatile Multi-modal Learning for Multi-label Emotion
Recognition
- URL: http://arxiv.org/abs/2201.05834v1
- Date: Sat, 15 Jan 2022 12:02:28 GMT
- Title: Tailor Versatile Multi-modal Learning for Multi-label Emotion
Recognition
- Authors: Yi Zhang, Mingyuan Chen, Jundong Shen, Chongjun Wang
- Abstract summary: Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities.
Previous methods mainly focus on projecting multiple modalities into a common latent space and learning an identical representation for all labels.
We propose versatile multi-modAl learning for multI-labeL emOtion Recognition (TAILOR), aiming to refine multi-modal representations and enhance discriminative capacity of each label.
- Score: 7.280460748655983
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various
human emotions from heterogeneous visual, audio and text modalities. Previous
methods mainly focus on projecting multiple modalities into a common latent
space and learning an identical representation for all labels, which neglects
the diversity of each modality and fails to capture richer semantic information
for each label from different perspectives. Besides, associated relationships
of modalities and labels have not been fully exploited. In this paper, we
propose versaTile multi-modAl learning for multI-labeL emOtion Recognition
(TAILOR), aiming to refine multi-modal representations and enhance
discriminative capacity of each label. Specifically, we design an adversarial
multi-modal refinement module to sufficiently explore the commonality among
different modalities and strengthen the diversity of each modality. To further
exploit label-modal dependence, we devise a BERT-like cross-modal encoder to
gradually fuse private and common modality representations in a granularity
descent way, as well as a label-guided decoder to adaptively generate a
tailored representation for each label with the guidance of label semantics. In
addition, we conduct experiments on the benchmark MMER dataset CMU-MOSEI in
both aligned and unaligned settings, which demonstrate the superiority of
TAILOR over the state-of-the-arts. Code is available at
https://github.com/kniter1/TAILOR.
Related papers
- Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis [25.66434557076494]
We propose a novel meta uni-label generation (MUG) framework to address the above problem.
We first design a contrastive-based projection module to bridge the gap between unimodal and multimodal representations.
We then propose unimodal and multimodal denoising tasks to train MUCN with explicit supervision via a bi-level optimization strategy.
arXiv Detail & Related papers (2024-08-28T03:43:01Z) - CARAT: Contrastive Feature Reconstruction and Aggregation for
Multi-Modal Multi-Label Emotion Recognition [18.75994345925282]
Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities.
The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data.
This paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task.
arXiv Detail & Related papers (2023-12-15T20:58:05Z) - Leveraging Label Information for Multimodal Emotion Recognition [22.318092635089464]
Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information.
We propose a novel approach for MER by leveraging label information.
We devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification.
arXiv Detail & Related papers (2023-09-05T10:26:32Z) - Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method.
On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems.
On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z) - DICNet: Deep Instance-Level Contrastive Network for Double Incomplete
Multi-View Multi-Label Classification [20.892833511657166]
Multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation.
We propose a deep instance-level contrastive network, namely DICNet, to deal with the double incomplete multi-view multi-label classification problem.
Our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels.
arXiv Detail & Related papers (2023-03-15T04:24:01Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels [70.45813147115126]
Multi-label image recognition with partial labels (MLR-PL) may greatly reduce the cost of annotation and thus facilitate large-scale MLR.
We find that strong semantic correlations exist within each image and across different images.
These correlations can help transfer the knowledge possessed by the known labels to retrieve the unknown labels.
arXiv Detail & Related papers (2022-05-23T08:37:38Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.