SGED: A Benchmark dataset for Performance Evaluation of Spiking Gesture
Emotion Recognition
- URL: http://arxiv.org/abs/2304.14714v1
- Date: Fri, 28 Apr 2023 09:32:09 GMT
- Title: SGED: A Benchmark dataset for Performance Evaluation of Spiking Gesture
Emotion Recognition
- Authors: Binqiang Wang and Gang Dong and Yaqian Zhao and Rengang Li and Lu Cao
and Lihua Lu
- Abstract summary: We label a new homogeneous multimodal gesture emotion recognition dataset based on the analysis of the existing data sets.
We propose a pseudo dual-flow network based on this dataset, and verify the application potential of this dataset in the affective computing community.
- Score: 12.396844568607522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of affective computing, researchers in the community have
promoted the performance of models and algorithms by using the complementarity
of multimodal information. However, the emergence of more and more modal
information makes the development of datasets unable to keep up with the
progress of existing modal sensing equipment. Collecting and studying
multimodal data is a complex and significant work. In order to supplement the
challenge of partial missing of community data. We collected and labeled a new
homogeneous multimodal gesture emotion recognition dataset based on the
analysis of the existing data sets. This data set complements the defects of
homogeneous multimodal data and provides a new research direction for emotion
recognition. Moreover, we propose a pseudo dual-flow network based on this
dataset, and verify the application potential of this dataset in the affective
computing community. The experimental results demonstrate that it is feasible
to use the traditional visual information and spiking visual information based
on homogeneous multimodal data for visual emotion recognition.The dataset is
available at \url{https://github.com/201528014227051/SGED}
Related papers
- EEG-based Multimodal Representation Learning for Emotion Recognition [26.257531037300325]
We introduce a novel multimodal framework that accommodates not only conventional modalities such as video, images, and audio, but also incorporates EEG data.
Our framework is designed to flexibly handle varying input sizes, while dynamically adjusting attention to account for feature importance across modalities.
arXiv Detail & Related papers (2024-10-29T01:35:17Z) - Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria.
We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets.
We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z) - Unity by Diversity: Improved Representation Learning in Multimodal VAEs [24.691068754720106]
We show that a better latent representation can be obtained by replacing hard constraints with a soft constraint.
We show improved learned latent representations and imputation of missing data modalities compared to existing methods.
arXiv Detail & Related papers (2024-03-08T13:29:46Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - A graph representation based on fluid diffusion model for multimodal
data analysis: theoretical aspects and enhanced community detection [14.601444144225875]
We introduce a novel model for graph definition based on fluid diffusion.
Our method is able to strongly outperform state-of-the-art schemes for community detection in multimodal data analysis.
arXiv Detail & Related papers (2021-12-07T16:30:03Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset:
Collection, Insights and Improvements [14.707930573950787]
We present MuSe-CaR, a first of its kind multimodal dataset.
The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge.
arXiv Detail & Related papers (2021-01-15T10:40:37Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.