Fugu-MT 論文翻訳(概要): Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey

論文の概要: Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey

arxiv url: http://arxiv.org/abs/2509.24322v1
Date: Mon, 29 Sep 2025 06:13:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.777593
Title: Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey
Title（参考訳）: マルチモーダルな大言語モデルとマルチモーダルな感情認識と推論
Authors: Yuntao Shou, Tao Meng, Wei Ai, Keqin Li,
Abstract要約: AI for Scienceでは、マルチモーダルな感情認識と推論が急速に成長するフロンティアとなっている。本論文は,マルチモーダル感情認識と推論によるMLLMの交点を包括的に調査する最初の試みである。
参考スコア（独自算出の注目度）: 40.20905051575087
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, large language models (LLMs) have driven major advances in language understanding, marking a significant step toward artificial general intelligence (AGI). With increasing demands for higher-level semantics and cross-modal fusion, multimodal large language models (MLLMs) have emerged, integrating diverse information sources (e.g., text, vision, and audio) to enhance modeling and reasoning in complex scenarios. In AI for Science, multimodal emotion recognition and reasoning has become a rapidly growing frontier. While LLMs and MLLMs have achieved notable progress in this area, the field still lacks a systematic review that consolidates recent developments. To address this gap, this paper provides a comprehensive survey of LLMs and MLLMs for emotion recognition and reasoning, covering model architectures, datasets, and performance benchmarks. We further highlight key challenges and outline future research directions, aiming to offer researchers both an authoritative reference and practical insights for advancing this domain. To the best of our knowledge, this paper is the first attempt to comprehensively survey the intersection of MLLMs with multimodal emotion recognition and reasoning. The summary of existing methods mentioned is in our Github: \href{https://github.com/yuntaoshou/Awesome-Emotion-Reasoning}{https://github.com/yuntaoshou/Awesome-Emotion-Reasoning}.
Abstract（参考訳）: 近年,大規模言語モデル(LLM)が言語理解に大きな進歩をもたらし,人工知能(AGI)への大きな一歩を踏み出した。高度なセマンティクスとクロスモーダル融合の要求が高まるにつれて、複雑なシナリオにおけるモデリングと推論を強化するために多様な情報ソース(テキスト、ビジョン、オーディオなど)を統合するマルチモーダルな大規模言語モデル(MLLM)が出現している。 AI for Scienceでは、マルチモーダルな感情認識と推論が急速に成長するフロンティアとなっている。 LLMとMLLMはこの分野で顕著な進歩を遂げているが、近年の進歩をまとめる体系的なレビューはいまだに欠けている。このギャップに対処するために、モデルアーキテクチャ、データセット、パフォーマンスベンチマークを網羅し、感情認識と推論のためのLLMとMLLMの総合的な調査を行う。我々はさらに、重要な課題を強調し、今後の研究の方向性を概説し、研究者にこの領域を前進させるための権威的な基準と実践的な洞察を提供することを目指している。本研究は,MLLMとマルチモーダル感情認識と推論の交点を包括的に調査する最初の試みである。既存のメソッドの要約はGithubにある。 \href{https://github.com/yuntaoshou/Awesome-Emotion-Reasoning}{https://github.com/yuntaoshou/Awesome-Emotion-Reasoning}。

論文の概要: Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey

関連論文リスト