Related papers: QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

URL: http://arxiv.org/abs/2411.01988v5
Date: Sat, 25 Jan 2025 02:46:53 GMT
Title: QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition
Authors: Chengpeng Wang, Li Chen, Lili Wang, Zhaofan Li, Xuebin Lv,
Abstract summary: We introduce Cross Similarity Attention to mine richer intrinsic information from image pairs.<n>We design a four-branch centrally symmetric network, named Quadruplet Cross Similarity (QCS), which alleviates gradient conflicts.<n>Our proposed method achieves state-of-the-art performance on several FER datasets.
Score: 17.7824127337701
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial expression recognition faces challenges where labeled significant features in datasets are mixed with unlabeled redundant ones. In this paper, we introduce Cross Similarity Attention (CSA) to mine richer intrinsic information from image pairs, overcoming a limitation when the Scaled Dot-Product Attention of ViT is directly applied to calculate the similarity between two different images. Based on CSA, we simultaneously minimize intra-class differences and maximize inter-class differences at the fine-grained feature level through interactions among multiple branches. Contrastive residual distillation is utilized to transfer the information learned in the cross module back to the base network. We ingeniously design a four-branch centrally symmetric network, named Quadruplet Cross Similarity (QCS), which alleviates gradient conflicts arising from the cross module and achieves balanced and stable training. It can adaptively extract discriminative features while isolating redundant ones. The cross-attention modules exist during training, and only one base branch is retained during inference, resulting in no increase in inference time. Extensive experiments show that our proposed method achieves state-of-the-art performance on several FER datasets.

Related papers

Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks. MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization. MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z)
Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReID [57.500045584556794]
We introduce a Modality-Unified Label Transfer (MULT) module that simultaneously accounts for both homogeneous and heterogeneous fine-grained instance-level structures. The proposed MULT ensures that the generated pseudo-labels maintain alignment across modalities while upholding structural consistency within intra-modality. Experiments demonstrate that our proposed method outperforms existing state-of-the-art USL-VI-ReID methods.
arXiv Detail & Related papers (2024-02-01T15:33:17Z)
Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID [56.573905143954015]
We propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters. Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level. Experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:27:46Z)
Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing. Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection. Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z)
CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples. Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning. We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z)
Relational Embedding for Few-Shot Classification [32.12002195421671]
We propose to address the problem of few-shot classification by meta-learning "what to observe" and "where to attend" in a relational perspective. Our method leverages patterns within and between images via self-correlational representation (SCR) and cross-correlational attention (CCA) Our Embedding Network (RENet) combines the two relational modules to learn relational embedding in an end-to-end manner.
arXiv Detail & Related papers (2021-08-22T08:44:55Z)
Dual-Cross Central Difference Network for Face Anti-Spoofing [54.81222020394219]
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems. Central difference convolution (CDC) has shown its excellent representation capacity for the FAS task. We propose two Cross Central Difference Convolutions (C-CDC), which exploit the difference of the center and surround sparse local features.
arXiv Detail & Related papers (2021-05-04T05:11:47Z)
Cross-Level Cross-Scale Cross-Attention Network for Point Cloud Representation [8.76786786874107]
Self-attention mechanism recently achieves impressive advancement in Natural Language Processing (NLP) and Image Processing domains. We propose an end-to-end architecture, dubbed Cross-Level Cross-Scale Cross-Attention Network (CLCSCANet) for point cloud representation learning.
arXiv Detail & Related papers (2021-04-27T09:01:14Z)
CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval [30.249581102239645]
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR) While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
arXiv Detail & Related papers (2021-04-20T12:11:12Z)
Progressive Co-Attention Network for Fine-grained Visual Classification [20.838908090777885]
Fine-grained visual classification aims to recognize images belonging to multiple sub-categories within a same category. Most existing methods only take individual image as input. We propose an effective method called progressive co-attention network (PCA-Net) to tackle this problem.
arXiv Detail & Related papers (2021-01-21T10:19:02Z)
Robust Facial Landmark Detection by Cross-order Cross-semantic Deep Network [58.843211405385205]
We propose a cross-order cross-semantic deep network (CCDN) to boost the semantic features learning for robust facial landmark detection. Specifically, a cross-order two-squeeze multi-excitation (CTM) module is proposed to introduce the cross-order channel correlations for more discriminative representations learning. A novel cross-order cross-semantic (COCS) regularizer is designed to drive the network to learn cross-order cross-semantic features from different activation for facial landmark detection.
arXiv Detail & Related papers (2020-11-16T08:19:26Z)
Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme. Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z)
Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels. We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data. RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet) Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms. Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z)
Multi-Margin based Decorrelation Learning for Heterogeneous Face Recognition [90.26023388850771]
This paper presents a deep neural network approach to extract decorrelation representations in a hyperspherical space for cross-domain face images. The proposed framework can be divided into two components: heterogeneous representation network and decorrelation representation learning. Experimental results on two challenging heterogeneous face databases show that our approach achieves superior performance on both verification and recognition tasks.
arXiv Detail & Related papers (2020-05-25T07:01:12Z)
ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.