Fugu-MT 論文翻訳(概要): MFAS: Emotion Recognition through Multiple Perspectives Fusion Architecture Search Emulating Human Cognition

論文の概要: MFAS: Emotion Recognition through Multiple Perspectives Fusion Architecture Search Emulating Human Cognition

arxiv url: http://arxiv.org/abs/2306.09361v2
Date: Mon, 25 Dec 2023 01:57:40 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-28 02:00:01.323454
Title: MFAS: Emotion Recognition through Multiple Perspectives Fusion Architecture Search Emulating Human Cognition
Title（参考訳）: MFAS: 人間の認知を模した多視点統合アーキテクチャ検索による感情認識
Authors: Haiyang Sun, Fulin Zhang, Zheng Lian, Yingying Guo, Shilei Zhang
Abstract要約: 音声感情認識は、人間に似たターゲット音声における感情状態を特定し、分析することを目的としている。連続的な視点から音声コンテンツを理解することによって、より包括的な感情情報を捉えることができることを示す。我々はMFAS(Multiple perspectives Fusion Architecture Search)と呼ばれる新しいフレームワークを提案する。
参考スコア（独自算出の注目度）: 10.998461754606131
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech emotion recognition aims to identify and analyze emotional states in target speech similar to humans. Perfect emotion recognition can greatly benefit a wide range of human-machine interaction tasks. Inspired by the human process of understanding emotions, we demonstrate that compared to quantized modeling, understanding speech content from a continuous perspective, akin to human-like comprehension, enables the model to capture more comprehensive emotional information. Additionally, considering that humans adjust their perception of emotional words in textual semantic based on certain cues present in speech, we design a novel search space and search for the optimal fusion strategy for the two types of information. Experimental results further validate the significance of this perception adjustment. Building on these observations, we propose a novel framework called Multiple perspectives Fusion Architecture Search (MFAS). Specifically, we utilize continuous-based knowledge to capture speech semantic and quantization-based knowledge to learn textual semantic. Then, we search for the optimal fusion strategy for them. Experimental results demonstrate that MFAS surpasses existing models in comprehensively capturing speech emotion information and can automatically adjust fusion strategy.
Abstract（参考訳）: 音声感情認識は、人間に似たターゲット音声における感情状態を特定し分析することを目的としている。完璧な感情認識は、幅広い人間と機械の相互作用に大いに役立つ。人間の感情理解のプロセスに触発されて,人間のような理解に類似した連続的な視点から音声コンテンツを理解することによって,より包括的な感情情報を取得することができることを示した。また,音声中の特定の手がかりに基づいて,人間が感情的な単語の知覚を調整することを考えると,新しい検索空間を設計し,その2種類の情報に対する最適な融合戦略を探索する。実験結果は、この知覚調整の意義をさらに検証する。これらの観測に基づいて,MFAS(Multiple perspectives Fusion Architecture Search)と呼ばれる新しいフレームワークを提案する。具体的には,連続的知識を用いて音声意味と量子化に基づく知識を捉え,テキスト意味を学習する。次に,それらの最適核融合戦略を探索する。実験の結果,MFASは音声感情情報を包括的にキャプチャする既存のモデルを超え,融合戦略を自動的に調整できることがわかった。

関連論文リスト

Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation [58.189703277322224]
音声保存表情操作(SPFEM)は、特定の参照感情を表示するために話頭を変更することを目的としている。参照およびソース入力に存在する感情とコンテンツ情報は、SPFEMモデルに対して直接的かつ正確な監視信号を提供することができる。コントラスト学習による指導として、コンテンツと感情の事前学習を提案し、分離されたコンテンツと感情表現を学習する。
論文参考訳（メタデータ） (2025-04-08T04:34:38Z)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
マルチモーダル・センティメント・アナリティクスは、テキスト、音声、視覚データを融合することで人間の感情を解き放つことを目指している。しかし、音声やビデオの表現の中で微妙な感情的なニュアンスを認識することは、恐ろしい挑戦だ。テキストの感情記述に基づくプログレッシブ・フュージョン・フレームワークであるDEVAを紹介する。
論文参考訳（メタデータ） (2024-12-12T11:30:41Z)
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition [54.952250732643115]
連続表現から派生した長さの固定長特徴である音響単語埋め込み(AWE)について検討し,その利点について検討した。 AWEは以前、音響的識別可能性の把握に有用であることを示した。以上の結果から,AWEが伝達する音響的文脈が明らかになり,高い競争力を持つ音声認識精度が示された。
論文参考訳（メタデータ） (2024-02-04T21:24:54Z)
TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation [0.78452977096722]
TelMEは、教師として働く言語モデルから非言語学生に情報を伝達するために、クロスモーダルな知識蒸留を取り入れている。次に、学生ネットワークが教師を支援するシフト・フュージョン・アプローチを用いて、マルチモーダルな特徴を組み合わせる。
論文参考訳（メタデータ） (2024-01-16T07:18:41Z)
MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition [7.81011775615268]
シングルコーパスとクロスコーパスSERの両方を同時に処理できる新しい統合SERフレームワークであるMSAC-SERNetを紹介する。様々な音声属性間の情報重なりを考慮し、異なる音声属性の相関に基づく新しい学習パラダイムを提案する。シングルコーパスSERシナリオとクロスコーパスSERシナリオの両方の実験は、MSAC-SERNetが最先端SERアプローチと比較して優れた性能を発揮することを示している。
論文参考訳（メタデータ） (2023-08-08T03:43:24Z)
EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge [0.0]
状況知識を用いた説明可能なマルチモーダル感情認識(EMERSK)を提案する。 EMERSKは視覚情報を用いた人間の感情認識と説明のための汎用システムである。本システムは, 表情, 姿勢, 歩行などの複数のモーダルを柔軟かつモジュラーな方法で処理することができる。
論文参考訳（メタデータ） (2023-06-14T17:52:37Z)
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning [70.30713251031052]
本研究では,データ駆動型深層学習モデル,すなわちSenseNetを提案する。実験の結果,提案した強度ネットの予測感情強度は,目視と目視の両方の真理値と高い相関性を示した。
論文参考訳（メタデータ） (2022-06-15T01:25:32Z)
M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation [1.3864478040954673]
視覚,音声,テキストのモダリティから感情関連特徴を抽出するマルチモーダルフュージョンネットワーク(M2FNet)を提案する。マルチヘッドアテンションに基づく融合機構を用いて、入力データの感情に富んだ潜在表現を結合する。提案する特徴抽出器は,音声および視覚データから感情関連特徴を学習するために,適応的マージンに基づく新しい三重項損失関数を用いて訓練される。
論文参考訳（メタデータ） (2022-06-05T14:18:58Z)
MMER: Multimodal Multi-task Learning for Speech Emotion Recognition [48.32879363033598]
MMERは,音声認識のためのマルチモーダルマルチタスク学習手法である。実際に、MMERはIEMOCAPベンチマークのベースラインと最先端のパフォーマンスをすべて達成します。
論文参考訳（メタデータ） (2022-03-31T04:51:32Z)
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models [53.31917090073727]
本稿では,音声とテキストのモダリティから,伝達学習モデルと微調整モデルとを融合したニューラルネットワークによる感情認識フレームワークを提案する。本稿では,対話型感情的モーションキャプチャー・データセットにおけるマルチモーダル・アプローチの有効性を評価する。
論文参考訳（メタデータ） (2022-02-16T00:23:42Z)
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition [118.73025093045652]
マルチモーダル感情認識のための事前学習モデル textbfMEmoBERT を提案する。従来の「訓練前、微妙な」パラダイムとは異なり、下流の感情分類タスクをマスク付きテキスト予測として再構成するプロンプトベースの手法を提案する。提案するMEMOBERTは感情認識性能を大幅に向上させる。
論文参考訳（メタデータ） (2021-10-27T09:57:00Z)
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation [57.68765353264689]
音声強調と音声分離は関連する2つの課題である。伝統的に、これらのタスクは信号処理と機械学習技術を使って取り組まれてきた。ディープラーニングは強力なパフォーマンスを達成するために利用されています。
論文参考訳（メタデータ） (2020-08-21T17:24:09Z)
Temporal aggregation of audio-visual modalities for emotion recognition [0.5352699766206808]
本研究では,時間的オフセットの異なる時間的オフセットと時間的ウィンドウからの音声・視覚的モダリティを組み合わせた感情認識のためのマルチモーダル融合手法を提案する。提案手法は,文献と人間の精度評価から,他の手法よりも優れている。
論文参考訳（メタデータ） (2020-07-08T18:44:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。