Using Scene and Semantic Features for Multi-modal Emotion Recognition
- URL: http://arxiv.org/abs/2308.00228v1
- Date: Tue, 1 Aug 2023 01:54:55 GMT
- Title: Using Scene and Semantic Features for Multi-modal Emotion Recognition
- Authors: Zhifeng Wang and Ramesh Sankaranarayana
- Abstract summary: We propose to use combined scene and semantic features, along with personal features, for multi-modal emotion recognition.
We use a modified EmbraceNet to extract features from the images, which is trained to learn both the body and pose features simultaneously.
We report an average precision of 40.39% across the 26 emotion categories, which is a 5% improvement over previous approaches.
- Score: 1.0152838128195467
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automatic emotion recognition is a hot topic with a wide range of
applications. Much work has been done in the area of automatic emotion
recognition in recent years. The focus has been mainly on using the
characteristics of a person such as speech, facial expression and pose for this
purpose. However, the processing of scene and semantic features for emotion
recognition has had limited exploration. In this paper, we propose to use
combined scene and semantic features, along with personal features, for
multi-modal emotion recognition. Scene features will describe the environment
or context in which the target person is operating. The semantic feature can
include objects that are present in the environment, as well as their
attributes and relationships with the target person. In addition, we use a
modified EmbraceNet to extract features from the images, which is trained to
learn both the body and pose features simultaneously. By fusing both body and
pose features, the EmbraceNet can improve the accuracy and robustness of the
model, particularly when dealing with partially missing data. This is because
having both body and pose features provides a more complete representation of
the subject in the images, which can help the model to make more accurate
predictions even when some parts of body are missing. We demonstrate the
efficiency of our method on the benchmark EMOTIC dataset. We report an average
precision of 40.39\% across the 26 emotion categories, which is a 5\%
improvement over previous approaches.
Related papers
- Multi-Branch Network for Imagery Emotion Prediction [4.618814297494939]
We present a novel Multi-Branch Network (MBN) to predict both discrete and continuous emotions in an image.
Our proposed method significantly outperforms state-of-the-art methods with 28.4% in mAP and 0.93 in MAE.
arXiv Detail & Related papers (2023-12-12T18:34:56Z) - EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes [53.95428298229396]
We introduce EmoSet, the first large-scale visual emotion dataset annotated with rich attributes.
EmoSet comprises 3.3 million images in total, with 118,102 of these images carefully labeled by human annotators.
Motivated by psychological studies, in addition to emotion category, each image is also annotated with a set of describable emotion attributes.
arXiv Detail & Related papers (2023-07-16T06:42:46Z) - High-Level Context Representation for Emotion Recognition in Images [4.987022981158291]
We propose an approach for high-level context representation extraction from images.
The model relies on a single cue and a single encoding stream to correlate this representation with emotions.
Our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition.
arXiv Detail & Related papers (2023-05-05T13:20:41Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - Multi-Cue Adaptive Emotion Recognition Network [4.570705738465714]
We propose a new deep learning approach for emotion recognition based on adaptive multi-cues.
We compare the proposed approach with the state-of-art approaches in the CAER-S dataset.
arXiv Detail & Related papers (2021-11-03T15:08:55Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - Facial Expression Editing with Continuous Emotion Labels [76.36392210528105]
Deep generative models have achieved impressive results in the field of automated facial expression editing.
We propose a model that can be used to manipulate facial expressions in facial images according to continuous two-dimensional emotion labels.
arXiv Detail & Related papers (2020-06-22T13:03:02Z) - Context Based Emotion Recognition using EMOTIC Dataset [22.631542327834595]
We present EMOTIC, a dataset of images of people annotated with their apparent emotion.
Using the EMOTIC dataset we train different CNN models for emotion recognition.
Our results show how scene context provides important information to automatically recognize emotional states.
arXiv Detail & Related papers (2020-03-30T12:38:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.