Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
- URL: http://arxiv.org/abs/2104.11560v1
- Date: Fri, 23 Apr 2021 12:36:19 GMT
- Title: Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
- Authors: Wenliang Dai, Samuel Cahyawijaya, Yejin Bang, Pascale Fung
- Abstract summary: We propose to leverage datasets using weakly-supervised multi-task learning to improve generalization performance.
Specifically, we explore three multimodal affect recognition tasks: 1) emotion recognition; 2) sentiment analysis; and 3) sarcasm recognition.
Our experimental results show that multi-tasking can benefit all these tasks, achieving an improvement up to 2.9% accuracy and 3.3% F1-score.
- Score: 33.7929682119287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal affect recognition constitutes an important aspect for enhancing
interpersonal relationships in human-computer interaction. However, relevant
data is hard to come by and notably costly to annotate, which poses a
challenging barrier to build robust multimodal affect recognition systems.
Models trained on these relatively small datasets tend to overfit and the
improvement gained by using complex state-of-the-art models is marginal
compared to simple baselines. Meanwhile, there are many different multimodal
affect recognition datasets, though each may be small. In this paper, we
propose to leverage these datasets using weakly-supervised multi-task learning
to improve the generalization performance on each of them. Specifically, we
explore three multimodal affect recognition tasks: 1) emotion recognition; 2)
sentiment analysis; and 3) sarcasm recognition. Our experimental results show
that multi-tasking can benefit all these tasks, achieving an improvement up to
2.9% accuracy and 3.3% F1-score. Furthermore, our method also helps to improve
the stability of model performance. In addition, our analysis suggests that
weak supervision can provide a comparable contribution to strong supervision if
the tasks are highly correlated.
Related papers
- MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.
MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.
Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs)
Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations.
We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z) - Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification [48.81069245141415]
We introduce a novel contrastive loss function for multi-label text classification.
It attains Micro-F1 scores that either match or surpass those obtained with other frequently employed loss functions.
It demonstrates a significant improvement in Macro-F1 scores across three multi-label datasets.
arXiv Detail & Related papers (2024-04-12T11:12:16Z) - Identifiability Results for Multimodal Contrastive Learning [72.15237484019174]
We show that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously.
Our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
arXiv Detail & Related papers (2023-03-16T09:14:26Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal
Human Activity Recognition [1.869225486385596]
We explore the hypothesis that leveraging multiple modalities can lead to better recognition.
We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition.
We propose a flexible, general-purpose framework for performing multimodal self-supervised learning.
arXiv Detail & Related papers (2022-05-20T10:39:16Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Learning Modality-Specific Representations with Self-Supervised
Multi-Task Learning for Multimodal Sentiment Analysis [11.368438990334397]
We develop a self-supervised learning strategy to acquire independent unimodal supervisions.
We conduct extensive experiments on three public multimodal baseline datasets.
Our method achieves comparable performance than human-annotated unimodal labels.
arXiv Detail & Related papers (2021-02-09T14:05:02Z) - Attend And Discriminate: Beyond the State-of-the-Art for Human Activity
Recognition using Wearable Sensors [22.786406177997172]
Wearables are fundamental to improving our understanding of human activities.
We rigorously explore new opportunities to learn enriched and highly discriminating activity representations.
Our contributions achieves new state-of-the-art performance on four diverse activity recognition problem benchmarks.
arXiv Detail & Related papers (2020-07-14T16:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.