Variants of BERT, Random Forests and SVM approach for Multimodal
Emotion-Target Sub-challenge
- URL: http://arxiv.org/abs/2007.13928v1
- Date: Tue, 28 Jul 2020 01:15:50 GMT
- Title: Variants of BERT, Random Forests and SVM approach for Multimodal
Emotion-Target Sub-challenge
- Authors: Hoang Manh Hung, Hyung-Jeong Yang, Soo-Hyung Kim, and Guee-Sang Lee
- Abstract summary: We present and discuss our classification methodology for MuSe-Topic Sub-challenge.
We ensemble two language models which are ALBERT and RoBERTa to predict 10 classes of topics.
- Score: 11.71437054341057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion recognition has become a major problem in computer vision in recent
years that made a lot of effort by researchers to overcome the difficulties in
this task. In the field of affective computing, emotion recognition has a wide
range of applications, such as healthcare, robotics, human-computer
interaction. Due to its practical importance for other tasks, many techniques
and approaches have been investigated for different problems and various data
sources. Nevertheless, comprehensive fusion of the audio-visual and language
modalities to get the benefits from them is still a problem to solve. In this
paper, we present and discuss our classification methodology for MuSe-Topic
Sub-challenge, as well as the data and results. For the topic classification,
we ensemble two language models which are ALBERT and RoBERTa to predict 10
classes of topics. Moreover, for the classification of valence and arousal, SVM
and Random forests are employed in conjunction with feature selection to
enhance the performance.
Related papers
- RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective [34.76568708378833]
Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions.
This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks.
The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks.
arXiv Detail & Related papers (2024-09-11T16:24:06Z) - BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks [2.9873893715462176]
This work introduces a novel framework named BiosERC, which investigates speaker characteristics in a conversation.
By employing Large Language Models (LLMs), we extract the "biographical information" of the speaker within a conversation.
Our proposed method achieved state-of-the-art (SOTA) results on three famous benchmark datasets.
arXiv Detail & Related papers (2024-07-05T06:25:34Z) - A Multi-Task, Multi-Modal Approach for Predicting Categorical and
Dimensional Emotions [0.0]
We propose a multi-task, multi-modal system that predicts categorical and dimensional emotions.
Results emphasise the importance of cross-regularisation between the two types of emotions.
arXiv Detail & Related papers (2023-12-31T16:48:03Z) - Deep Imbalanced Learning for Multimodal Emotion Recognition in
Conversations [15.705757672984662]
Multimodal Emotion Recognition in Conversations (MERC) is a significant development direction for machine intelligence.
Many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion recognition.
We propose the Class Boundary Enhanced Representation Learning (CBERL) model to address the imbalanced distribution of emotion categories in raw data.
We have conducted extensive experiments on the IEMOCAP and MELD benchmark datasets, and the results show that CBERL has achieved a certain performance improvement in the effectiveness of emotion recognition.
arXiv Detail & Related papers (2023-12-11T12:35:17Z) - Machine Unlearning: A Survey [56.79152190680552]
A special need has arisen where, due to privacy, usability, and/or the right to be forgotten, information about some specific samples needs to be removed from a model, called machine unlearning.
This emerging technology has drawn significant interest from both academics and industry due to its innovation and practicality.
No study has analyzed this complex topic or compared the feasibility of existing unlearning solutions in different kinds of scenarios.
The survey concludes by highlighting some of the outstanding issues with unlearning techniques, along with some feasible directions for new research opportunities.
arXiv Detail & Related papers (2023-06-06T10:18:36Z) - Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area.
We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions.
We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z) - Foundations and Recent Trends in Multimodal Machine Learning:
Principles, Challenges, and Open Questions [68.6358773622615]
This paper provides an overview of the computational and theoretical foundations of multimodal machine learning.
We propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification.
Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches.
arXiv Detail & Related papers (2022-09-07T19:21:19Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z) - A Review on Explainability in Multimodal Deep Neural Nets [2.3204178451683264]
multimodal AI techniques have achieved much success in several application domains.
Despite their outstanding performance, the complex, opaque and black-box nature of the deep neural nets limits their social acceptance and usability.
This paper extensively reviews the present literature to present a comprehensive survey and commentary on the explainability in multimodal deep neural nets.
arXiv Detail & Related papers (2021-05-17T14:17:49Z) - Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
of Vision-and-Language BERTs [57.74359320513427]
Methods have been proposed for pretraining vision and language BERTs to tackle challenges at the intersection of these two key areas of AI.
We study the differences between these two categories, and show how they can be unified under a single theoretical framework.
We conduct controlled experiments to discern the empirical differences between five V&L BERTs.
arXiv Detail & Related papers (2020-11-30T18:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.