LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition
- URL: http://arxiv.org/abs/2404.15041v2
- Date: Fri, 26 Apr 2024 13:38:08 GMT
- Title: LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition
- Authors: Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li,
- Abstract summary: Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition.
We propose a unified framework termed hierarchicaL dEcoupling And Fusing to coordinate expression-relevant representations and pseudo-labels.
We show that LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data.
- Score: 56.22672276092373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the coin by proposing a unified framework termed hierarchicaL dEcoupling And Fusing (LEAF) to coordinate expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF decouples representations into expression-agnostic and expression-relevant components, and adaptively fuses them using learnable gating weights. (2) At the category level, LEAF assigns ambiguous pseudo-labels by decoupling predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at https://anonymous.4open.science/r/LEAF-BC57/.
Related papers
- A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition [6.468625143772815]
We propose a unified label-aware token-level contrastive learning framework.
Our approach enriches the context by utilizing label semantics as suffix prompts.
It simultaneously optimize context-native and context-label contrastive learning objectives.
arXiv Detail & Related papers (2024-04-26T06:19:21Z) - Synergistic Anchored Contrastive Pre-training for Few-Shot Relation
Extraction [4.7220779071424985]
Few-shot Relation Extraction (FSRE) aims to extract facts from a sparse set of labeled corpora.
Recent studies have shown promising results in FSRE by employing Pre-trained Language Models.
We introduce a novel synergistic anchored contrastive pre-training framework.
arXiv Detail & Related papers (2023-12-19T10:16:24Z) - BERM: Training the Balanced and Extractable Representation for Matching
to Improve Generalization Ability of Dense Retrieval [54.66399120084227]
We propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM.
Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets.
arXiv Detail & Related papers (2023-05-18T15:43:09Z) - Associating Spatially-Consistent Grouping with Text-supervised Semantic
Segmentation [117.36746226803993]
We introduce self-supervised spatially-consistent grouping with text-supervised semantic segmentation.
Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition.
Our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks.
arXiv Detail & Related papers (2023-04-03T16:24:39Z) - BEST: BERT Pre-Training for Sign Language Recognition with Coupling
Tokenization [135.73436686653315]
We are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition( SLR) model.
Considering the dominance of hand and body in sign language expression, we organize them as pose triplet units and feed them into the Transformer backbone.
Pre-training is performed via reconstructing the masked triplet unit from the corrupted input sequence.
It adaptively extracts the discrete pseudo label from the pose triplet unit, which represents the semantic gesture/body state.
arXiv Detail & Related papers (2023-02-10T06:23:44Z) - 1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop [1.5362025549031049]
Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes.
We propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings.
We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard.
arXiv Detail & Related papers (2023-02-06T09:47:46Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z) - MatchGAN: A Self-Supervised Semi-Supervised Conditional Generative
Adversarial Network [51.84251358009803]
We present a novel self-supervised learning approach for conditional generative adversarial networks (GANs) under a semi-supervised setting.
We perform augmentation by randomly sampling sensible labels from the label space of the few labelled examples available.
Our method surpasses the baseline with only 20% of the labelled examples used to train the baseline.
arXiv Detail & Related papers (2020-06-11T17:14:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.