A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
- URL: http://arxiv.org/abs/2307.04570v3
- Date: Mon, 25 Mar 2024 13:31:33 GMT
- Title: A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
- Authors: Jakub Paplham, Vojtech Franc,
- Abstract summary: We offer an extensive comparative analysis for state-of-the-art facial age estimation methods.
We find that the performance differences between the methods are negligible compared to the effect of other factors.
We propose using FaRL as the backbone model and demonstrate its effectiveness on all public datasets.
- Score: 2.156208381257605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Comparing different age estimation methods poses a challenge due to the unreliability of published results stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. This paper identifies two trivial, yet persistent issues with the currently used evaluation protocol and describes how to resolve them. We offer an extensive comparative analysis for state-of-the-art facial age estimation methods. Surprisingly, we find that the performance differences between the methods are negligible compared to the effect of other factors, such as facial alignment, facial coverage, image resolution, model architecture, or the amount of data used for pretraining. We use the gained insights to propose using FaRL as the backbone model and demonstrate its effectiveness on all public datasets. We make the source code and exact data splits public on GitHub.
Related papers
- A Survey on Deep Learning-based Gaze Direction Regression: Searching for the State-of-the-art [0.0]
We present a survey of deep learning-based methods for the regression of gaze direction vector from head and eye images.
We describe in detail numerous published methods with a focus on the input data, architecture of the model, and loss function used to supervise the model.
We present a list of datasets that can be used to train and evaluate gaze direction regression methods.
arXiv Detail & Related papers (2024-10-22T15:07:07Z) - DualView: Data Attribution from the Dual Perspective [16.083769847895336]
We present DualView, a novel method for post-hoc data attribution based on surrogate modelling.
We find that DualView requires considerably lower computational resources than other methods, while demonstrating comparable performance across evaluation metrics.
arXiv Detail & Related papers (2024-02-19T13:13:16Z) - Too Good To Be True: performance overestimation in (re)current practices
for Human Activity Recognition [49.1574468325115]
sliding windows for data segmentation followed by standard random k-fold cross validation produce biased results.
It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked.
Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
arXiv Detail & Related papers (2023-10-18T13:24:05Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - CVTT: Cross-Validation Through Time [0.0]
We argue that leaving out a method's continuous performance can lead to losing valuable insight into joint data-method effects.
Using the proposed technique, we conduct a detailed analysis of popular RecSys algorithms' performance against various metrics and datasets.
Our results show that model performance can vary significantly over time, and both data and evaluation setup can have a marked effect on it.
arXiv Detail & Related papers (2022-05-11T10:30:38Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - A Critical Assessment of State-of-the-Art in Entity Alignment [1.7725414095035827]
We investigate two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs.
We first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable.
arXiv Detail & Related papers (2020-10-30T15:09:19Z) - Learning Expectation of Label Distribution for Facial Age and
Attractiveness Estimation [65.5880700862751]
We analyze the essential relationship between two state-of-the-art methods (Ranking-CNN and DLDL) and show that the Ranking method is in fact learning label distribution implicitly.
We propose a lightweight network architecture and propose a unified framework which can jointly learn facial attribute distribution and regress attribute value.
Our method achieves new state-of-the-art results using the single model with 36$times$ fewer parameters and 3$times$ faster inference speed on facial age/attractiveness estimation.
arXiv Detail & Related papers (2020-07-03T15:46:53Z) - On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link
Prediction Methods [27.27230441498167]
We take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment.
In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets.
We show that this leads to various problems in the interpretation of results, which may support misleading conclusions.
arXiv Detail & Related papers (2020-02-17T12:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.