GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
- URL: http://arxiv.org/abs/2312.04293v3
- Date: Mon, 18 Mar 2024 01:50:08 GMT
- Title: GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
- Authors: Zheng Lian, Licai Sun, Haiyang Sun, Kang Chen, Zhuofan Wen, Hao Gu, Bin Liu, Jianhua Tao,
- Abstract summary: GPT-4 with Vision (GPT-4V) has demonstrated remarkable visual capabilities across various tasks, but its performance in emotion recognition has not been fully evaluated.
We present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks.
- Score: 38.2581985358104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, GPT-4 with Vision (GPT-4V) has demonstrated remarkable visual capabilities across various tasks, but its performance in emotion recognition has not been fully evaluated. To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition. This paper collectively refers to these tasks as ``Generalized Emotion Recognition (GER)''. Through experimental analysis, we observe that GPT-4V exhibits strong visual understanding capabilities in GER tasks. Meanwhile, GPT-4V shows the ability to integrate multimodal clues and exploit temporal information, which is also critical for emotion recognition. However, it's worth noting that GPT-4V is primarily designed for general domains and cannot recognize micro-expressions that require specialized knowledge. To the best of our knowledge, this paper provides the first quantitative assessment of GPT-4V for GER tasks. We have open-sourced the code and encourage subsequent researchers to broaden the evaluation scope by including more tasks and datasets. Our code and evaluation results are available at: https://github.com/zeroQiaoba/gpt4v-emotion.
Related papers
- ChatGPT Meets Iris Biometrics [10.902536447343465]
This study utilizes the advanced capabilities of the GPT-4 multimodal Large Language Model (LLM) to explore its potential in iris recognition.
We investigate how well AI tools like ChatGPT can understand and analyze iris images.
Our findings suggest a promising path for future research and the development of more adaptable, efficient, robust and interactive biometric security solutions.
arXiv Detail & Related papers (2024-08-09T05:13:07Z) - GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing [74.68232970965595]
Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos.
This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks.
arXiv Detail & Related papers (2024-03-09T13:56:25Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks [53.936643052339]
We evaluate the reasoning abilities of text-only and multimodal versions of GPT-4.
Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.
arXiv Detail & Related papers (2023-11-14T04:33:49Z) - GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks [70.98062518872999]
We validate GPT-4V's capabilities for evaluation purposes, addressing tasks ranging from foundational image-to-text and text-to-image synthesis to high-level image-to-image translations and multi-images to text alignment.
Notably, GPT-4V shows promising agreement with humans across various tasks and evaluation methods, demonstrating immense potential for multi-modal LLMs as evaluators.
arXiv Detail & Related papers (2023-11-02T16:11:09Z) - An Early Evaluation of GPT-4V(ision) [40.866323649060696]
We evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio.
To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V.
arXiv Detail & Related papers (2023-10-25T10:33:17Z) - The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs.
GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system.
GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.