$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild
- URL: http://arxiv.org/abs/2002.02957v1
- Date: Fri, 7 Feb 2020 18:53:13 GMT
- Title: $M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild
- Authors: Yuan-Hang Zhang, Rulin Huang, Jiabei Zeng, Shiguang Shan and Xilin
Chen
- Abstract summary: This report describes a multi-modal multi-task ($M3$T) approach underlying our submission to the valence-arousal estimation track of the Affective Behavior Analysis in-the-wild (ABAW) Challenge.
In the proposed $M3$T framework, we fuse both visual features from videos and acoustic features from the audio tracks to estimate the valence and arousal.
We evaluated the $M3$T framework on the validation set provided by ABAW and it significantly outperforms the baseline method.
- Score: 86.40973759048957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report describes a multi-modal multi-task ($M^3$T) approach underlying
our submission to the valence-arousal estimation track of the Affective
Behavior Analysis in-the-wild (ABAW) Challenge, held in conjunction with the
IEEE International Conference on Automatic Face and Gesture Recognition (FG)
2020. In the proposed $M^3$T framework, we fuse both visual features from
videos and acoustic features from the audio tracks to estimate the valence and
arousal. The spatio-temporal visual features are extracted with a 3D
convolutional network and a bidirectional recurrent neural network. Considering
the correlations between valence / arousal, emotions, and facial actions, we
also explores mechanisms to benefit from other tasks. We evaluated the $M^3$T
framework on the validation set provided by ABAW and it significantly
outperforms the baseline method.
Related papers
- RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z) - Intensity Profile Projection: A Framework for Continuous-Time
Representation Learning for Dynamic Networks [50.2033914945157]
We present a representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data.
The framework consists of three stages: estimating pairwise intensity functions, learning a projection which minimises a notion of intensity reconstruction error.
Moreoever, we develop estimation theory providing tight control on the error of any estimated trajectory, indicating that the representations could even be used in quite noise-sensitive follow-on analyses.
arXiv Detail & Related papers (2023-06-09T15:38:25Z) - Assessor360: Multi-sequence Network for Blind Omnidirectional Image
Quality Assessment [50.82681686110528]
Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs)
The quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process.
We propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure.
arXiv Detail & Related papers (2023-05-18T13:55:28Z) - A Dual Branch Network for Emotional Reaction Intensity Estimation [12.677143408225167]
We propose a solution to the ERI challenge of the fifth Affective Behavior Analysis in-the-wild(ABAW), a dual-branch based multi-output regression model.
The spatial attention is used to better extract visual features, and the Mel-Frequency Cepstral Coefficients technology extracts acoustic features.
Our method achieves excellent results on the official validation set.
arXiv Detail & Related papers (2023-03-16T10:31:40Z) - An Efficient End-to-End Transformer with Progressive Tri-modal Attention
for Multi-modal Emotion Recognition [27.96711773593048]
We propose the multi-modal end-to-end transformer (ME2ET), which can effectively model the tri-modal features interaction.
At the low-level, we propose the progressive tri-modal attention, which can model the tri-modal feature interactions by adopting a two-pass strategy.
At the high-level, we introduce the tri-modal feature fusion layer to explicitly aggregate the semantic representations of three modalities.
arXiv Detail & Related papers (2022-09-20T14:51:38Z) - An Ensemble Approach for Multiple Emotion Descriptors Estimation Using
Multi-task Learning [12.589338141771385]
This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition.
Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face.
The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.
arXiv Detail & Related papers (2022-07-22T04:57:56Z) - Estimation of Reliable Proposal Quality for Temporal Action Detection [71.5989469643732]
We propose a new method that gives insights into moment and region perspectives simultaneously to align the two tasks by acquiring reliable proposal quality.
For the moment perspective, Boundary Evaluate Module (BEM) is designed which focuses on local appearance and motion evolvement to estimate boundary quality.
For the region perspective, we introduce Region Evaluate Module (REM) which uses a new and efficient sampling method for proposal feature representation.
arXiv Detail & Related papers (2022-04-25T14:33:49Z) - Prior Aided Streaming Network for Multi-task Affective Recognitionat the
2nd ABAW2 Competition [9.188777864190204]
We introduce our submission to the 2nd Affective Behavior Analysis in-the-wild (ABAW2) Competition.
In dealing with different emotion representations, we propose a multi-task streaming network.
We leverage an advanced facial expression embedding as prior knowledge.
arXiv Detail & Related papers (2021-07-08T09:35:08Z) - Facial Affect Recognition in the Wild Using Multi-Task Learning
Convolutional Network [0.0]
This paper presents a neural network based method submitted to the Affective Behavior Analysis in-the-Wild Challenge in FG 2020.
By utilizing multi-task learning, this network can estimate and recognize three quantified affective models.
arXiv Detail & Related papers (2020-02-03T09:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.