Multi-threshold Deep Metric Learning for Facial Expression Recognition
- URL: http://arxiv.org/abs/2406.16434v1
- Date: Mon, 24 Jun 2024 08:27:31 GMT
- Title: Multi-threshold Deep Metric Learning for Facial Expression Recognition
- Authors: Wenwu Yang, Jinyi Yu, Tuo Chen, Zhenguang Liu, Xun Wang, Jianbing Shen,
- Abstract summary: We present the multi-threshold deep metric learning technique, which avoids the difficult threshold validation.
We find that each threshold of the triplet loss intrinsically determines a distinctive distribution of inter-class variations.
It makes the embedding layer, which is composed of a set of slices, a more informative and discriminative feature.
- Score: 60.26967776920412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective expression feature representations generated by a triplet-based deep metric learning are highly advantageous for facial expression recognition (FER). The performance of triplet-based deep metric learning is contingent upon identifying the best threshold for triplet loss. Threshold validation, however, is tough and challenging, as the ideal threshold changes among datasets and even across classes within the same dataset. In this paper, we present the multi-threshold deep metric learning technique, which not only avoids the difficult threshold validation but also vastly increases the capacity of triplet loss learning to construct expression feature representations. We find that each threshold of the triplet loss intrinsically determines a distinctive distribution of inter-class variations and corresponds, thus, to a unique expression feature representation. Therefore, rather than selecting a single optimal threshold from a valid threshold range, we thoroughly sample thresholds across the range, allowing the representation characteristics manifested by thresholds within the range to be fully extracted and leveraged for FER. To realize this approach, we partition the embedding layer of the deep metric learning network into a collection of slices and model training these embedding slices as an end-to-end multi-threshold deep metric learning problem. Each embedding slice corresponds to a sample threshold and is learned by enforcing the corresponding triplet loss, yielding a set of distinct expression features, one for each embedding slice. It makes the embedding layer, which is composed of a set of slices, a more informative and discriminative feature, hence enhancing the FER accuracy. Extensive evaluations demonstrate the superior performance of the proposed approach on both posed and spontaneous facial expression datasets.
Related papers
- Interpretable Triplet Importance for Personalized Ranking [5.409302364904161]
We propose a shapely value-based method to measure the triplet importance in an interpretable manner.
Our model consistently outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-28T11:46:55Z) - ASM: Adaptive Sample Mining for In-The-Wild Facial Expression
Recognition [19.846612021056565]
We introduce a novel approach called Adaptive Sample Mining to address ambiguity and noise within each expression category.
Our method can effectively mine both ambiguity and noise, and outperform SOTA methods on both synthetic noisy and original datasets.
arXiv Detail & Related papers (2023-10-09T11:18:22Z) - Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies.
We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z) - Triplet Loss-less Center Loss Sampling Strategies in Facial Expression
Recognition Scenarios [5.672538282456803]
Deep neural network (DNN) accompanied by deep metric learning (DML) techniques boost the discriminative ability of the model inFER applications.
We developed three strategies: fully-synthesized, semi-synthesized, and prediction-based negative sample selection strategies.
To achieve better results, we introduce a selective attention module that provides a combination of pixel-wise and element-wise attention coefficients.
arXiv Detail & Related papers (2023-02-08T15:03:36Z) - Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search [90.30623718137244]
We propose a novel deep hashing method for scalable multi-label image search.
A new rank-consistency objective is applied to align the similarity orders from two spaces.
A powerful loss function is designed to penalize the samples whose semantic similarity and hamming distance are mismatched.
arXiv Detail & Related papers (2021-02-02T13:46:58Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z) - Personalized Activity Recognition with Deep Triplet Embeddings [2.1320960069210475]
We present an approach to personalized activity recognition based on deep embeddings derived from a fully convolutional neural network.
We evaluate these methods on three publicly available inertial human activity recognition data sets.
arXiv Detail & Related papers (2020-01-15T19:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.