Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge
- URL: http://arxiv.org/abs/2403.11450v1
- Date: Mon, 18 Mar 2024 03:59:24 GMT
- Title: Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge
- Authors: Jiahe Wang, Jiale Huang, Bingzhao Cai, Yifan Cao, Xin Yun, Shangfei Wang,
- Abstract summary: We propose a zero-shot approach for recognizing compound expressions by leveraging a pretrained visual language model integrated with some traditional CNN networks.
In this study, we propose a zero-shot approach for recognizing compound expressions by leveraging a pretrained visual language model integrated with some traditional CNN networks.
- Score: 11.49671335206114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional approaches to facial expression recognition primarily focus on the classification of six basic facial expressions. Nevertheless, real-world situations present a wider range of complex compound expressions that consist of combinations of these basics ones due to limited availability of comprehensive training datasets. The 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW) offered unlabeled datasets containing compound expressions. In this study, we propose a zero-shot approach for recognizing compound expressions by leveraging a pretrained visual language model integrated with some traditional CNN networks.
Related papers
- Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification [49.41632476658246]
We discuss the extension of DFKD to Vision-Language Foundation Models without access to the billion-level image-text datasets.
The objective is to customize a student model for distribution-agnostic downstream tasks with given category concepts.
We propose three novel Prompt Diversification methods to encourage image synthesis with diverse styles.
arXiv Detail & Related papers (2024-07-21T13:26:30Z) - Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge [6.26485278174662]
Compound Expression Recognition (CER) is vital for effective interpersonal interactions.
In this paper, we propose an ensemble learning-based solution to address this complexity.
Our method demonstrates high accuracy on the RAF-DB datasets and is capable of recognizing expressions in certain portions of the C-EXPR-DB through zero-shot learning.
arXiv Detail & Related papers (2024-07-17T01:59:34Z) - The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge [3.92894296845466]
This report presents a solution for the zero-shot referring expression comprehension task.
Our approach achieved accuracy rates of 84.825 on the A leaderboard and 71.460 on the B leaderboard, securing the first position.
arXiv Detail & Related papers (2024-07-06T08:31:33Z) - 7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition [46.730335566738006]
This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition.
The ABAW Competition addresses novel challenges in understanding human expressions and behaviors.
arXiv Detail & Related papers (2024-07-04T11:04:29Z) - Compound Expression Recognition via Multi Model Ensemble [8.529105068848828]
Compound Expression Recognition plays a crucial role in interpersonal interactions.
We propose a solution based on ensemble learning methods for Compound Expression Recognition.
Our method achieves high accuracy on RAF-DB and is able to recognize expressions through zero-shot on certain portions of C-EXPR-DB.
arXiv Detail & Related papers (2024-03-19T09:30:56Z) - Exploring Facial Expression Recognition through Semi-Supervised Pretraining and Temporal Modeling [8.809586885539002]
This paper presents our approach for the upcoming 6th Affective Behavior Analysis in-the-Wild (ABAW) competition.
In the 6th ABAW competition, our method achieved outstanding results on the official validation set.
arXiv Detail & Related papers (2024-03-18T16:36:54Z) - Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation [114.72734384299476]
We propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information.
We leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.
Our approach significantly boosts the capacity of segmentation models for unseen classes.
arXiv Detail & Related papers (2024-03-13T11:23:55Z) - Visual In-Context Learning for Large Vision-Language Models [62.5507897575317]
In Large Visual Language Models (LVLMs) the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities.
We introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition.
Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations.
arXiv Detail & Related papers (2024-02-18T12:43:38Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - Idiomatic Expression Identification using Semantic Compatibility [8.355785779504869]
We study the task of detecting whether a sentence has an idiomatic expression and localizing it.
We propose a multi-stage neural architecture with the attention flow mechanism for identifying these expressions.
A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines.
arXiv Detail & Related papers (2021-10-19T15:44:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.