Fugu-MT 論文翻訳(概要): DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis

論文の概要: DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis

arxiv url: http://arxiv.org/abs/2601.06870v1
Date: Sun, 11 Jan 2026 11:29:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-13 19:08:01.041437
Title: DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis
Title（参考訳）: DaQ-MSA:マルチモーダル感性分析のための拡散拡大と定量化
Authors: Jiazhang Liang, Jianheng Dai, Miaosen Luo, Menghua Jiang, Sijie Mai,
Abstract要約: マルチモーダル大規模言語モデル (MLLM) は視覚言語タスクにおいて高い性能を示した。しかし、それらのマルチモーダル感情分析の有効性は、高品質なトレーニングデータの不足によって制限されている。本稿では,サンプルの信頼性を評価し,適応的なトレーニング重みを割り当てるための品質評価モジュールを提案する。
参考スコア（独自算出の注目度）: 5.214131153441384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal large language models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their effectiveness on multimodal sentiment analysis remains constrained by the scarcity of high-quality training data, which limits accurate multimodal understanding and generalization. To alleviate this bottleneck, we leverage diffusion models to perform semantics-preserving augmentation on the video and audio modalities, expanding the multimodal training distribution. However, increasing data quantity alone is insufficient, as diffusion-generated samples exhibit substantial quality variation and noisy augmentations may degrade performance. We therefore propose DaQ-MSA (Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis), which introduces a quality scoring module to evaluate the reliability of augmented samples and assign adaptive training weights. By down-weighting low-quality samples and emphasizing high-fidelity ones, DaQ-MSA enables more stable learning. By integrating the generative capability of diffusion models with the semantic understanding of MLLMs, our approach provides a robust and generalizable automated augmentation strategy for training MLLMs without any human annotation or additional supervision.
Abstract（参考訳）: マルチモーダル大規模言語モデル (MLLM) は視覚言語タスクにおいて高い性能を示してきたが, 高精度なマルチモーダル理解と一般化を制限した高品質なトレーニングデータの不足により, マルチモーダル感情分析の有効性は依然として制限されている。このボトルネックを緩和するために、拡散モデルを活用して、ビデオとオーディオのモダリティにセマンティクスを保存し、マルチモーダルなトレーニング分布を拡大する。しかし,データ量の増加だけでは不十分であり,拡散生成試料は相当な品質変化を示し,ノイズ増大により性能が低下する可能性がある。そこで我々は,DAQ-MSA (Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis)を提案する。低品質のサンプルを減らし、高忠実なサンプルを強調することで、DaQ-MSAはより安定した学習を可能にします。拡散モデルの生成能力をMLLMの意味的理解と統合することにより、人間のアノテーションや追加の監督なしにMLLMをトレーニングするための堅牢で一般化可能な自動拡張戦略を提供する。

論文の概要: DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis

関連論文リスト