A Comparative Study of Transfer Learning for Emotion Recognition using CNN and Modified VGG16 Models
- URL: http://arxiv.org/abs/2407.14576v1
- Date: Fri, 19 Jul 2024 17:41:46 GMT
- Title: A Comparative Study of Transfer Learning for Emotion Recognition using CNN and Modified VGG16 Models
- Authors: Samay Nathani,
- Abstract summary: We investigate the performance of CNN and Modified VGG16 models for emotion recognition tasks across two datasets: FER2013 and AffectNet.
Our findings reveal that both models achieve reasonable performance on the FER2013 dataset, with the Modified VGG16 model demonstrating slightly increased accuracy.
When evaluated on the Affect-Net dataset, performance declines for both models, with the Modified VGG16 model continuing to outperform the CNN.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion recognition is a critical aspect of human interaction. This topic garnered significant attention in the field of artificial intelligence. In this study, we investigate the performance of convolutional neural network (CNN) and Modified VGG16 models for emotion recognition tasks across two datasets: FER2013 and AffectNet. Our aim is to measure the effectiveness of these models in identifying emotions and their ability to generalize to different and broader datasets. Our findings reveal that both models achieve reasonable performance on the FER2013 dataset, with the Modified VGG16 model demonstrating slightly increased accuracy. When evaluated on the Affect-Net dataset, performance declines for both models, with the Modified VGG16 model continuing to outperform the CNN. Our study emphasizes the importance of dataset diversity in emotion recognition and discusses open problems and future research directions, including the exploration of multi-modal approaches and the development of more comprehensive datasets.
Related papers
- DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Texture-Based Input Feature Selection for Action Recognition [3.9596068699962323]
We propose a novel method to determine the task-irrelevant content in inputs which increases the domain discrepancy.
We show that our proposed model is superior to existing models for action recognition on the HMDB-51 dataset and the Penn Action dataset.
arXiv Detail & Related papers (2023-02-28T23:56:31Z) - A Comparative Study of Data Augmentation Techniques for Deep Learning
Based Emotion Recognition [11.928873764689458]
We conduct a comprehensive evaluation of popular deep learning approaches for emotion recognition.
We show that long-range dependencies in the speech signal are critical for emotion recognition.
Speed/rate augmentation offers the most robust performance gain across models.
arXiv Detail & Related papers (2022-11-09T17:27:03Z) - Exploring the Effects of Data Augmentation for Drivable Area
Segmentation [0.0]
We focus on investigating the benefits of data augmentation by analyzing pre-existing image datasets.
Our results show that the performance and robustness of existing state of the art (or SOTA) models can be increased dramatically.
arXiv Detail & Related papers (2022-08-06T03:39:37Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Facial Emotion Recognition: State of the Art Performance on FER2013 [0.0]
We achieve the highest single-network classification accuracy on the FER2013 dataset.
Our model achieves state-of-the-art single-network accuracy of 73.28 % on FER2013 without using extra training data.
arXiv Detail & Related papers (2021-05-08T04:20:53Z) - A Neural Architecture for Detecting Confusion in Eye-tracking Data [1.8655840060559168]
We introduce an architecture that uses RNN and CNN sub-models in parallel to take advantage of the temporal and visuospatial aspects of our data.
Our model outperforms an existing model based on Random Forests resulting in a 22% improvement in combined sensitivity & specificity.
arXiv Detail & Related papers (2020-03-13T18:20:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.