Related papers: Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective

Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective

URL: http://arxiv.org/abs/2409.07388v2
Date: Wed, 30 Oct 2024 15:42:55 GMT
Title: Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Authors: Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai, Erik Cambria, Hasti Seifi,
Abstract summary: Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions. This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks. The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks.
Score: 34.76568708378833
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions, especially in text-dominated multimodal affective computing field. This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks: multimodal sentiment analysis, multimodal emotion recognition in conversation, multimodal aspect-based sentiment analysis and multimodal multi-label emotion recognition. The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks, offering a comprehensive report on the recent progress in multimodal affective computing from an NLP perspective. This survey covers the formalization of tasks, provides an overview of relevant works, describes benchmark datasets, and details the evaluation metrics for each task. Additionally, it briefly discusses research in multimodal affective computing involving facial expressions, acoustic signals, physiological signals, and emotion causes. Additionally, we discuss the technical approaches, challenges, and future directions in multimodal affective computing. To support further research, we released a repository that compiles related works in multimodal affective computing, providing detailed resources and references for the community.

Related papers

Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing [19.177541719713666]
Multi-modal affective computing aims to automatically recognize and interpret human attitudes from diverse data sources such as images and text.<n>We propose a novel approach for affective computing that explicitly deconstructs visual and textual representations into shared (modality-invariant) and modality-specific components.
arXiv Detail & Related papers (2025-06-08T11:15:57Z)
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey [124.23247710880008]
multimodal CoT (MCoT) reasoning has recently garnered significant research attention. Existing MCoT studies design various methodologies to address the challenges of image, video, speech, audio, 3D, and structured data. We present the first systematic survey of MCoT reasoning, elucidating the relevant foundational concepts and definitions.
arXiv Detail & Related papers (2025-03-16T18:39:13Z)
Cross-Modal Consistency in Multimodal Large Language Models [33.229271701817616]
We introduce a novel concept termed cross-modal consistency. Our experimental findings reveal a pronounced inconsistency between the vision and language modalities within GPT-4V. Our research yields insights into the appropriate utilization of such models and hints at potential avenues for enhancing their design.
arXiv Detail & Related papers (2024-11-14T08:22:42Z)
A Survey of Stance Detection on Social Media: New Directions and Perspectives [50.27382951812502]
stance detection has emerged as a crucial subfield within affective computing. Recent years have seen a surge of research interest in developing effective stance detection methods. This paper provides a comprehensive survey of stance detection techniques on social media.
arXiv Detail & Related papers (2024-09-24T03:06:25Z)
Multi-Task Learning for Affect Analysis [0.0]
This project investigates two primary approaches: uni-task solutions and a multi-task approach to the same problems. The project utilizes existing a neural network architecture, adapting it for multi-task learning by modifying output layers and loss functions. The research aspires to contribute to the burgeoning field of affective computing, with applications spanning healthcare, marketing, and human-computer interaction.
arXiv Detail & Related papers (2024-06-30T12:36:37Z)
Modality Influence in Multimodal Machine Learning [0.0]
The study examines Multimodal Sentiment Analysis, Multimodal Emotion Recognition, Multimodal Hate Speech Recognition, and Multimodal Disease Detection. The research aims to identify the most influential modality or set of modalities for each task and draw conclusions for diverse multimodal classification tasks.
arXiv Detail & Related papers (2023-06-10T16:28:52Z)
Expanding the Role of Affective Phenomena in Multimodal Interaction Research [57.069159905961214]
We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing. We identify 910 affect-related papers and present our analysis of the role of affective phenomena in these papers. We find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states.
arXiv Detail & Related papers (2023-05-18T09:08:39Z)
Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area. We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions. We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Multimodal Image Synthesis and Editing: The Generative AI Era [131.9569600472503]
multimodal image synthesis and editing has become a hot research topic in recent years. We comprehensively contextualize the advance of the recent multimodal image synthesis and editing. We describe benchmark datasets and evaluation metrics as well as corresponding experimental results.
arXiv Detail & Related papers (2021-12-27T10:00:16Z)
A Review on Explainability in Multimodal Deep Neural Nets [2.3204178451683264]
multimodal AI techniques have achieved much success in several application domains. Despite their outstanding performance, the complex, opaque and black-box nature of the deep neural nets limits their social acceptance and usability. This paper extensively reviews the present literature to present a comprehensive survey and commentary on the explainability in multimodal deep neural nets.
arXiv Detail & Related papers (2021-05-17T14:17:49Z)
Variants of BERT, Random Forests and SVM approach for Multimodal Emotion-Target Sub-challenge [11.71437054341057]
We present and discuss our classification methodology for MuSe-Topic Sub-challenge. We ensemble two language models which are ALBERT and RoBERTa to predict 10 classes of topics.
arXiv Detail & Related papers (2020-07-28T01:15:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.