Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
- URL: http://arxiv.org/abs/2404.07484v1
- Date: Thu, 11 Apr 2024 05:44:27 GMT
- Title: Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
- Authors: Yuan Zhang, Xiaomei Tao, Hanxu Ai, Tao Chen, Yanling Gan,
- Abstract summary: In Massive Open Online Courses (MOOC), semantic information of instructional videos has a crucial impact on learners' emotional state.
This paper proposes a multimodal emotion recognition method by fusing video semantic information and semantic signals.
The experimental results show that our method has significantly improved emotion recognition performance.
- Score: 6.987099464814016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the Massive Open Online Courses (MOOC) learning scenario, the semantic information of instructional videos has a crucial impact on learners' emotional state. Learners mainly acquire knowledge by watching instructional videos, and the semantic information in the videos directly affects learners' emotional states. However, few studies have paid attention to the potential influence of the semantic information of instructional videos on learners' emotional states. To deeply explore the impact of video semantic information on learners' emotions, this paper innovatively proposes a multimodal emotion recognition method by fusing video semantic information and physiological signals. We generate video descriptions through a pre-trained large language model (LLM) to obtain high-level semantic information about instructional videos. Using the cross-attention mechanism for modal interaction, the semantic information is fused with the eye movement and PhotoPlethysmoGraphy (PPG) signals to obtain the features containing the critical information of the three modes. The accurate recognition of learners' emotional states is realized through the emotion classifier. The experimental results show that our method has significantly improved emotion recognition performance, providing a new perspective and efficient method for emotion recognition research in MOOC learning scenarios. The method proposed in this paper not only contributes to a deeper understanding of the impact of instructional videos on learners' emotional states but also provides a beneficial reference for future research on emotion recognition in MOOC learning scenarios.
Related papers
- Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning [4.819180187758932]
Micro Gestures (MG) are closely linked to human emotions.
Existing Micro Gesture Recognition (MGR) methods utilize only a single modality (e.g., RGB or skeleton) while overlooking crucial textual information.
We propose a simple but effective visual-text contrastive learning solution that utilizes text information for MGR.
arXiv Detail & Related papers (2024-05-03T07:11:25Z) - Exploring Emotion Expression Recognition in Older Adults Interacting
with a Virtual Coach [22.00225071959289]
EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging.
This paper outlines the development of the emotion expression recognition module of the virtual coach, encompassing data collection, annotation design, and a first methodological approach.
arXiv Detail & Related papers (2023-11-09T18:22:32Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - Learning Emotion Representations from Verbal and Nonverbal Communication [7.747924294389427]
We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication.
We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning.
EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains.
arXiv Detail & Related papers (2023-05-22T21:36:55Z) - A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In
Zero Shot [67.00455874279383]
We propose verbalizing long videos to generate descriptions in natural language, then performing video-understanding tasks on the generated story as opposed to the original video.
Our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding.
To alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.
arXiv Detail & Related papers (2023-05-16T19:13:11Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Recognizing Emotions evoked by Movies using Multitask Learning [3.4290619267487488]
Methods for recognizing evoked emotions are usually trained on human annotated data.
We propose two deep learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture.
Our results show that the MT approach can more accurately model each viewer and the aggregated annotation when compared to methods that are directly trained on the aggregated annotations.
arXiv Detail & Related papers (2021-07-30T10:21:40Z) - Affective Image Content Analysis: Two Decades Review and New
Perspectives [132.889649256384]
We will comprehensively review the development of affective image content analysis (AICA) in the recent two decades.
We will focus on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence.
We discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
arXiv Detail & Related papers (2021-06-30T15:20:56Z) - Meta Transfer Learning for Emotion Recognition [42.61707533351803]
We propose a PathNet-based transfer learning method that is able to transfer emotional knowledge learned from one visual/audio emotion domain to another visual/audio emotion domain.
Our proposed system is capable of improving the performance of emotion recognition, making its performance substantially superior to the recent proposed fine-tuning/pre-trained models based transfer learning methods.
arXiv Detail & Related papers (2020-06-23T00:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.