CARAT: Contrastive Feature Reconstruction and Aggregation for
Multi-Modal Multi-Label Emotion Recognition
- URL: http://arxiv.org/abs/2312.10201v3
- Date: Sat, 13 Jan 2024 06:05:33 GMT
- Title: CARAT: Contrastive Feature Reconstruction and Aggregation for
Multi-Modal Multi-Label Emotion Recognition
- Authors: Cheng Peng, Ke Chen, Lidan Shou, Gang Chen
- Abstract summary: Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities.
The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data.
This paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task.
- Score: 18.75994345925282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal multi-label emotion recognition (MMER) aims to identify relevant
emotions from multiple modalities. The challenge of MMER is how to effectively
capture discriminative features for multiple labels from heterogeneous data.
Recent studies are mainly devoted to exploring various fusion strategies to
integrate multi-modal information into a unified representation for all labels.
However, such a learning scheme not only overlooks the specificity of each
modality but also fails to capture individual discriminative features for
different labels. Moreover, dependencies of labels and modalities cannot be
effectively modeled. To address these issues, this paper presents ContrAstive
feature Reconstruction and AggregaTion (CARAT) for the MMER task. Specifically,
we devise a reconstruction-based fusion mechanism to better model fine-grained
modality-to-label dependencies by contrastively learning modal-separated and
label-specific features. To further exploit the modality complementarity, we
introduce a shuffle-based aggregation strategy to enrich co-occurrence
collaboration among labels. Experiments on two benchmark datasets CMU-MOSEI and
M3ED demonstrate the effectiveness of CARAT over state-of-the-art methods. Code
is available at https://github.com/chengzju/CARAT.
Related papers
- Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection [25.195711274756334]
We propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues.
Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features.
The correlation instance and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph.
arXiv Detail & Related papers (2024-06-18T01:47:38Z) - Exploring Homogeneous and Heterogeneous Consistent Label Associations
for Unsupervised Visible-Infrared Person ReID [62.81466902601807]
Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to retrieve pedestrian images of the same identity from different modalities without annotations.
We introduce a Modality-Unified Label Transfer (MULT) module that simultaneously accounts for both homogeneous and heterogeneous fine-grained instance-level structures.
It models both homogeneous and heterogeneous affinities, leveraging them to define the inconsistency for the pseudo-labels and then minimize it, leading to pseudo-labels that maintain alignment across modalities and consistency within intra-modality structures.
arXiv Detail & Related papers (2024-02-01T15:33:17Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method.
On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems.
On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z) - Reliable Representations Learning for Incomplete Multi-View Partial Multi-Label Classification [78.15629210659516]
In this paper, we propose an incomplete multi-view partial multi-label classification network named RANK.
We break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample.
Our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels.
arXiv Detail & Related papers (2023-03-30T03:09:25Z) - Learning Disentangled Label Representations for Multi-label
Classification [39.97251974500034]
One-shared-Feature-for-Multiple-Labels (OFML) is not conducive to learning discriminative label features.
We introduce the One-specific-Feature-for-One-Label (OFOL) mechanism and propose a novel disentangled label feature learning framework.
We achieve state-of-the-art performance on eight datasets.
arXiv Detail & Related papers (2022-12-02T21:49:34Z) - Multi-Label Continual Learning using Augmented Graph Convolutional
Network [7.115602040521868]
Multi-Label Continual Learning builds a class-incremental framework in a sequential multi-label image recognition data stream.
The study proposes an Augmented Graph Convolutional Network (AGCN++) that can construct the cross-task label relationships in MLCL.
The proposed method is evaluated using two multi-label image benchmarks.
arXiv Detail & Related papers (2022-11-27T08:40:19Z) - Multimodal Emotion Recognition with Modality-Pairwise Unsupervised
Contrastive Loss [80.79641247882012]
We focus on unsupervised feature learning for Multimodal Emotion Recognition (MER)
We consider discrete emotions, and as modalities text, audio and vision are used.
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
arXiv Detail & Related papers (2022-07-23T10:11:24Z) - Tailor Versatile Multi-modal Learning for Multi-label Emotion
Recognition [7.280460748655983]
Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities.
Previous methods mainly focus on projecting multiple modalities into a common latent space and learning an identical representation for all labels.
We propose versatile multi-modAl learning for multI-labeL emOtion Recognition (TAILOR), aiming to refine multi-modal representations and enhance discriminative capacity of each label.
arXiv Detail & Related papers (2022-01-15T12:02:28Z) - Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels.
Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction.
To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.