Fugu-MT 論文翻訳(概要): Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models

論文の概要: Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models

arxiv url: http://arxiv.org/abs/2202.08974v1
Date: Wed, 16 Feb 2022 00:23:42 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-21 14:52:10.062458
Title: Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models
Title（参考訳）: 話者認識とBERTモデルからの伝達学習を用いたマルチモーダル感情認識
Authors: Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha and Ram D. Sriram
Abstract要約: 本稿では,音声とテキストのモダリティから,伝達学習モデルと微調整モデルとを融合したニューラルネットワークによる感情認識フレームワークを提案する。本稿では,対話型感情的モーションキャプチャー・データセットにおけるマルチモーダル・アプローチの有効性を評価する。
参考スコア（独自算出の注目度）: 53.31917090073727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic emotion recognition plays a key role in computer-human interaction as it has the potential to enrich the next-generation artificial intelligence with emotional intelligence. It finds applications in customer and/or representative behavior analysis in call centers, gaming, personal assistants, and social robots, to mention a few. Therefore, there has been an increasing demand to develop robust automatic methods to analyze and recognize the various emotions. In this paper, we propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities. More specifically, we i) adapt a residual network (ResNet) based model trained on a large-scale speaker recognition task using transfer learning along with a spectrogram augmentation approach to recognize emotions from speech, and ii) use a fine-tuned bidirectional encoder representations from transformers (BERT) based model to represent and recognize emotions from the text. The proposed system then combines the ResNet and BERT-based model scores using a late fusion strategy to further improve the emotion recognition performance. The proposed multimodal solution addresses the data scarcity limitation in emotion recognition using transfer learning, data augmentation, and fine-tuning, thereby improving the generalization performance of the emotion recognition models. We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture (IEMOCAP) dataset. Experimental results indicate that both audio and text-based models improve the emotion recognition performance and that the proposed multimodal solution achieves state-of-the-art results on the IEMOCAP benchmark.
Abstract（参考訳）: 自動感情認識は、次世代の人工知能を感情的知性で豊かにする可能性を持つため、コンピュータと人間のインタラクションにおいて重要な役割を果たす。コールセンター、ゲーム、パーソナルアシスタント、ソーシャルロボットにおける顧客および/または代表的行動分析の応用例をいくつか挙げる。そのため,様々な感情を分析し認識するためのロバストな自動手法の開発が求められている。本稿では,音声とテキストのモダリティから,伝達学習モデルと微調整モデルとを融合したニューラルネットワークによる感情認識フレームワークを提案する。より具体的には一伝達学習を用いた大規模話者認識タスクを訓練した残差ネットワーク(ResNet)モデル及び音声からの感情認識のためのスペクトログラム増強アプローチを適用すること。二変換器(BERT)ベースのモデルからの微調整された双方向エンコーダ表現を用いて、テキストから感情を表現し、認識する。提案システムは,ResNetとBERTをベースとしたモデルスコアを,後期融合戦略を用いて組み合わせ,感情認識性能をさらに向上させる。提案するマルチモーダルソリューションは、伝達学習、データ拡張、微調整を用いて感情認識におけるデータ不足を解消し、感情認識モデルの一般化性能を向上させる。本研究では,対話型感情的動的モーションキャプチャー(IEMOCAP)データセットに対するマルチモーダルアプローチの有効性を評価する。実験結果から, 音声モデルとテキストモデルの両方で感情認識性能が向上し, 提案したマルチモーダル・ソリューションがIEMOCAPベンチマークの最先端結果を達成することが示された。

関連論文リスト

Emotion Detection Using Conditional Generative Adversarial Networks (cGAN): A Deep Learning Approach [0.0]
本稿では,cGANを用いた深層学習による感情検出手法を提案する。単一のデータ型に依存する従来のユニモーダル手法とは異なり、テキスト、音声、表情を統合するマルチモーダルフレームワークを探索する。提案したcGANアーキテクチャは、合成感情に富んだデータを生成し、複数のモーダルの分類精度を向上させるために訓練されている。
論文参考訳（メタデータ） (2025-08-06T14:32:22Z)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
AIMDiTと呼ばれる新しいフレームワークを提案し、深い特徴のマルチモーダル融合の問題を解決する。公開ベンチマークデータセットMELDでAIMDiTフレームワークを使用して行った実験では、Acc-7とw-F1メトリクスの2.34%と2.87%の改善が明らかにされた。
論文参考訳（メタデータ） (2024-04-12T11:31:18Z)
Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations [15.705757672984662]
会話におけるマルチモーダル感情認識(MERC)は、マシンインテリジェンスにとって重要な開発方向である。 MERCのデータの多くは自然に感情カテゴリーの不均衡な分布を示しており、研究者は感情認識に対する不均衡なデータの負の影響を無視している。生データにおける感情カテゴリーの不均衡分布に対処するクラス境界拡張表現学習(CBERL)モデルを提案する。我々は,IEMOCAPおよびMELDベンチマークデータセットの広範な実験を行い,CBERLが感情認識の有効性において一定の性能向上を達成したことを示す。
論文参考訳（メタデータ） (2023-12-11T12:35:17Z)
A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning [0.800062359410795]
強化学習(conER-GRL)を用いたグラフ畳み込みネットワークを用いた文脈的感情認識のための新しいパラダイムを提案する。会話は、文脈情報の効果的な抽出のために、発話の小さなグループに分割される。このシステムは、GRU(Gated Recurrent Units)を用いて、これらの発話群からマルチモーダル特徴を抽出する。
論文参考訳（メタデータ） (2023-10-24T14:31:17Z)
EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks [0.0]
本研究では,音声認識における深層学習技術の統合について検討する。既存の話者ダイアリゼーションパイプラインと、畳み込みニューラルネットワーク(CNN)上に構築された感情識別モデルを組み合わせたフレームワークを導入する。提案モデルでは,63%の非重み付き精度が得られ,音声信号中の感情状態を正確に同定する上で,顕著な効率性を示した。
論文参考訳（メタデータ） (2023-10-19T16:02:53Z)
A Comparative Study of Data Augmentation Techniques for Deep Learning Based Emotion Recognition [11.928873764689458]
感情認識のための一般的なディープラーニングアプローチを包括的に評価する。音声信号の長距離依存性が感情認識に重要であることを示す。スピード/レート向上は、モデル間で最も堅牢なパフォーマンス向上を提供する。
論文参考訳（メタデータ） (2022-11-09T17:27:03Z)
Interpretability for Multimodal Emotion Recognition using Concept Activation Vectors [0.0]
概念活性化ベクトル(CAV)を用いた感情認識におけるニューラルネットワークの解釈可能性の問題に対処する。 Emotion AI特有の人間理解可能な概念を定義し、広く使われているIEMOCAPマルチモーダルデータベースにマッピングする。次に,2方向コンテキストLSTM(BC-LSTM)ネットワークの複数の層において,提案する概念の影響を評価する。
論文参考訳（メタデータ） (2022-02-02T15:02:42Z)
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition [118.73025093045652]
マルチモーダル感情認識のための事前学習モデル textbfMEmoBERT を提案する。従来の「訓練前、微妙な」パラダイムとは異なり、下流の感情分類タスクをマスク付きテキスト予測として再構成するプロンプトベースの手法を提案する。提案するMEMOBERTは感情認識性能を大幅に向上させる。
論文参考訳（メタデータ） (2021-10-27T09:57:00Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
音声感情認識(SER)は、人間とコンピュータの相互作用において重要な役割を果たす課題である。 SERの主な課題の1つは、データの不足である。本稿では,スペクトログラム拡張と併用した移動学習戦略を提案する。
論文参考訳（メタデータ） (2021-08-05T10:39:39Z)
An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
属性選択機構によってこれらの問題に柔軟に対処できる音声表現を導出する属性整合学習戦略を提案する。具体的には、音声表現を属性依存ノードに分解する層式表現可変オートエンコーダ(LR-VAE)を提案する。提案手法は,IDのないSER上での競合性能と,無感情SV上でのより良い性能を実現する。
論文参考訳（メタデータ） (2021-06-05T06:19:14Z)
Continuous Emotion Recognition via Deep Convolutional Autoencoder and Support Vector Regressor [70.2226417364135]
マシンはユーザの感情状態を高い精度で認識できることが不可欠である。ディープニューラルネットワークは感情を認識する上で大きな成功を収めている。表情認識に基づく連続的感情認識のための新しいモデルを提案する。
論文参考訳（メタデータ） (2020-01-31T17:47:16Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。