Fugu-MT 論文翻訳(概要): Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space

論文の概要: Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space

arxiv url: http://arxiv.org/abs/2503.10104v1
Date: Thu, 13 Mar 2025 07:02:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-03-14 21:36:22.564562
Title: Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space
Title（参考訳）: Mamba-VA:Valence-Arousal Spaceにおける連続感情認識のためのマンバベースアプローチ
Authors: Yuheng Liang, Zheyu Wang, Feng Liu, Mingzhou Liu, Yu Yao,
Abstract要約: 連続感情認識(Continuous Emotion Recognition, CER)は、知的人間とコンピュータの相互作用、メンタルヘルスモニタリング、自律運転において重要な役割を果たす。本稿では,映像フレームの逐次的感情変動を効率的にモデル化するために,Mambaアーキテクチャを利用した新しい感情認識モデルMamba-VAを提案する。
参考スコア（独自算出の注目度）: 13.235058335538607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continuous Emotion Recognition (CER) plays a crucial role in intelligent human-computer interaction, mental health monitoring, and autonomous driving. Emotion modeling based on the Valence-Arousal (VA) space enables a more nuanced representation of emotional states. However, existing methods still face challenges in handling long-term dependencies and capturing complex temporal dynamics. To address these issues, this paper proposes a novel emotion recognition model, Mamba-VA, which leverages the Mamba architecture to efficiently model sequential emotional variations in video frames. First, the model employs a Masked Autoencoder (MAE) to extract deep visual features from video frames, enhancing the robustness of temporal information. Then, a Temporal Convolutional Network (TCN) is utilized for temporal modeling to capture local temporal dependencies. Subsequently, Mamba is applied for long-sequence modeling, enabling the learning of global emotional trends. Finally, a fully connected (FC) layer performs regression analysis to predict continuous valence and arousal values. Experimental results on the Valence-Arousal (VA) Estimation task of the 8th competition on Affective Behavior Analysis in-the-wild (ABAW) demonstrate that the proposed model achieves valence and arousal scores of 0.5362 (0.5036) and 0.4310 (0.4119) on the validation (test) set, respectively, outperforming the baseline. The source code is available on GitHub:https://github.com/FreedomPuppy77/Charon.
Abstract（参考訳）: 連続感情認識(Continuous Emotion Recognition, CER)は、知的人間とコンピュータの相互作用、メンタルヘルスモニタリング、自律運転において重要な役割を果たす。 Valence-Arousal(VA)空間に基づく感情モデリングは、感情状態のよりニュアンスな表現を可能にする。しかし、既存の手法は、長期的な依存関係を扱い、複雑な時間的ダイナミクスをキャプチャする際の課題に直面している。そこで本研究では,映像フレームの逐次的感情変動を効率的にモデル化するために,Mamba-VAという新しい感情認識モデルを提案する。まず、Masked Autoencoder (MAE)を用いてビデオフレームから深い視覚的特徴を抽出し、時間情報の堅牢性を高める。次に、時間的モデリングにおいて、時間的畳み込みネットワーク(TCN)を用いて、局所的時間的依存関係をキャプチャする。その後、マンバは時系列モデリングに応用され、グローバルな感情的傾向の学習を可能にした。最後に、完全連結(FC)層が回帰解析を行い、連続的な原子価と覚醒値を予測する。 The Valence-Arousal (VA) Estimation task of the 8st competition on Affective Behavior Analysis in-the-wild (ABAW) showed that the proposed model achieved valence and arousal scores of 0.5362 (0.5036) and 0.4310 (0.4119) on the validation (test) set。ソースコードはGitHubで入手できる。

関連論文リスト

Emotion Recognition with CLIP and Sequential Learning [5.66758879852618]
本稿では,Valence-Arousal (VA) Estimation Challenge, Expression Recognition Challenge, and the Action Unit (AU) Detection Challengeについて述べる。本手法では,継続的な感情認識の促進を目的とした新しい枠組みを導入する。
論文参考訳（メタデータ） (2025-03-13T01:02:06Z)
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
連続的な視覚生成には、フルシーケンスの拡散に基づくアプローチが必要である。本稿では,自己回帰的ブロックワイド条件拡散変換器ACDiTを提案する。本稿では,拡散目標を訓練しながら,視覚理解タスクにACDiTをシームレスに使用できることを実証する。
論文参考訳（メタデータ） (2024-12-10T18:13:20Z)
Time-Dependent VAE for Building Latent Representations from Visual Neural Activity with Complex Dynamics [25.454851828755054]
TiDeSPL-VAEは複雑な視覚神経活動を効果的に分析し、自然な方法で時間的関係をモデル化することができる。結果から,本モデルは自然主義的なシーン/ムーブメントにおいて最高の復号性能を得るだけでなく,明示的なニューラルダイナミクスを抽出することがわかった。
論文参考訳（メタデータ） (2024-08-15T03:27:23Z)
MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
本研究では,マンバをベースとした純フレームワーク(MambaVT)を提案する。具体的には、長距離クロスフレーム統合コンポーネントを考案し、ターゲットの外観変化にグローバルに適応する。実験では、RGB-TトラッキングのためのMambaのビジョンの可能性が示され、MambaVTは4つの主要なベンチマークで最先端のパフォーマンスを達成した。
論文参考訳（メタデータ） (2024-08-15T02:29:00Z)
Mamba-Spike: Enhancing the Mamba Architecture with a Spiking Front-End for Efficient Temporal Data Processing [4.673285689826945]
Mamba-Spikeは、スパイクするフロントエンドとMambaのバックボーンを統合して、効率的な時間的データ処理を実現する新しいニューロモルフィックアーキテクチャである。このアーキテクチャは、最先端のベースラインを一貫して上回り、高い精度、低いレイテンシ、エネルギー効率の向上を実現している。
論文参考訳（メタデータ） (2024-08-04T14:10:33Z)
Vision Mamba: A Comprehensive Survey and Taxonomy [11.025533218561284]
状態空間モデル (State Space Model, SSM) は、動的システムの振る舞いを記述・解析するために用いられる数学的モデルである。最新の状態空間モデルに基づいて、Mambaは時間変化パラメータをSSMにマージし、効率的なトレーニングと推論のためのハードウェア認識アルゴリズムを定式化する。 Mambaは、Transformerを上回る可能性のある、新たなAIアーキテクチャになることが期待されている。
論文参考訳（メタデータ） (2024-05-07T15:30:14Z)
Boosting Continuous Emotion Recognition with Self-Pretraining using Masked Autoencoders, Temporal Convolutional Networks, and Transformers [3.951847822557829]
本研究では,Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, Action Unit (AU) Detection Challengeに取り組む。本研究は,継続的な感情認識を改善するための新しいアプローチを提唱する。我々は、顔データセット上でMasked Autoencoders(MAE)を事前トレーニングし、その後、式(Expr)ラベルを付加したaff-wild2データセットを微調整することで、これを実現する。
論文参考訳（メタデータ） (2024-03-18T03:28:01Z)
Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
本稿では,医療画像のセグメンテーションに特化して設計された新しいマンバモデルSwin-UMambaを紹介する。 Swin-UMamba は CNN や ViT,最新の Mamba ベースのモデルと比較して,優れたパフォーマンスを示している。
論文参考訳（メタデータ） (2024-02-05T18:58:11Z)
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
野生における動的表情認識(DFER)は、データ制限によって依然として妨げられている。抽出された顔のランドマーク認識機能に暗黙的に符号化された既存のSFER知識と動的情報を活用する新しい静的・動的モデル(S2D)を提案する。
論文参考訳（メタデータ） (2023-12-09T03:16:09Z)
Leveraging TCN and Transformer for effective visual-audio fusion in continuous emotion recognition [0.5370906227996627]
本稿では,Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, Action Unit (AU) Detection Challengeを提案する。本稿では、時間的畳み込みネットワーク(TCN)とトランスフォーマーを利用して、連続的な感情認識の性能を向上させる新しいマルチモーダル融合モデルを提案する。
論文参考訳（メタデータ） (2023-03-15T04:15:57Z)
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition [72.36055502078193]
本稿では,声帯からの感情認識のための連鎖回帰モデルに基づく階層的枠組みを提案する。データスパシティの課題に対処するため、レイヤワイドおよび時間アグリゲーションモジュールを備えた自己教師付き学習(SSL)表現も使用しています。提案されたシステムは、ACII Affective Vocal Burst (A-VB) Challenge 2022に参加し、「TWO」および「CULTURE」タスクで第1位となった。
論文参考訳（メタデータ） (2023-03-14T16:08:45Z)
Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks [82.54695985117783]
In-theld でキャプチャした長いビデオシーケンスを用いて,持続的な感情認識のための最先端のディープラーニングアーキテクチャの適合性を検討する。我々は,2D-CNNと長期記憶ユニットを組み合わせた畳み込みリカレントニューラルネットワークと,2D-CNNモデルの微調整時の重みを膨らませて構築した膨らませた3D-CNNモデルを開発した。
論文参考訳（メタデータ） (2020-11-18T13:42:05Z)
A Multi-term and Multi-task Analyzing Framework for Affective Analysis in-the-wild [0.2216657815393579]
本稿では,ABAW(Affective Behavior Analysis in-the-Wild)2020 Contestに提出された感情認識手法を紹介する。感情行動には独自の時間枠を持つ観測可能な多くの特徴があるため、複数の最適化された時間窓を導入しました。時間ごとの感情認識モデルを作成し、これらのモデルをまとめました。
論文参考訳（メタデータ） (2020-09-29T09:24:29Z)
Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
時間的変動を一般化することは、ビデオにおける効果的な行動認識の前提条件である。 Squeeze Re Temporal Gates (SRTG) を導入する。 SRTPGブロックを使用する場合,GFLOの数は最小限に抑えられ,一貫した改善が見られた。
論文参考訳（メタデータ） (2020-06-15T09:36:28Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。