FRoundation: Are Foundation Models Ready for Face Recognition?
        - URL: http://arxiv.org/abs/2410.23831v2
 - Date: Fri, 01 Nov 2024 12:11:29 GMT
 - Title: FRoundation: Are Foundation Models Ready for Face Recognition?
 - Authors: Tahar Chettaoui, Naser Damer, Fadi Boutros, 
 - Abstract summary: We propose and demonstrate the adaptation of foundation models for face recognition across different levels of data availability.
Our results indicate that, despite their versatility, pre-trained foundation models underperform in face recognition.
Fine-tuning foundation models yields promising results, often surpassing models trained from scratch when training data is limited.
 - Score: 8.045296450065019
 - License: http://creativecommons.org/licenses/by-nc-sa/4.0/
 - Abstract:   Foundation models are predominantly trained in an unsupervised or self-supervised manner on highly diverse and large-scale datasets, making them broadly applicable to various downstream tasks. In this work, we investigate for the first time whether such models are suitable for the specific domain of face recognition. We further propose and demonstrate the adaptation of these models for face recognition across different levels of data availability. Extensive experiments are conducted on multiple foundation models and datasets of varying scales for training and fine-tuning, with evaluation on a wide range of benchmarks. Our results indicate that, despite their versatility, pre-trained foundation models underperform in face recognition compared to similar architectures trained specifically for this task. However, fine-tuning foundation models yields promising results, often surpassing models trained from scratch when training data is limited. Even with access to large-scale face recognition training datasets, fine-tuned foundation models perform comparably to models trained from scratch, but with lower training computational costs and without relying on the assumption of extensive data availability. Our analysis also explores bias in face recognition, with slightly higher bias observed in some settings when using foundation models. 
 
       
      
        Related papers
        - Approximating Language Model Training Data from Weights [70.08614275061689]
We formalize the problem of data approximation from model weights and propose several baselines and metrics.<n>We develop a gradient-based approach that selects the highest-matching data from a large public text corpus.<n>Even when none of the true training data is known, our method is able to locate a small subset of public Web documents.
arXiv  Detail & Related papers  (2025-06-18T15:26:43Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape   Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.
We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv  Detail & Related papers  (2025-01-16T18:59:46Z) - High-Performance Few-Shot Segmentation with Foundation Models: An   Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv  Detail & Related papers  (2024-09-10T08:04:11Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn,   Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.
LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.
Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv  Detail & Related papers  (2024-09-10T00:59:18Z) - When are Foundation Models Effective? Understanding the Suitability for   Pixel-Level Classification Using Multispectral Imagery [23.464350453312584]
Foundation models, i.e. very large deep learning models, have demonstrated impressive performances in various language and vision tasks.
Are foundation models always a suitable choice for different remote sensing tasks, and when or when not?
This work aims to enhance the understanding of the status and suitability of foundation models for pixel-level classification.
arXiv  Detail & Related papers  (2024-04-17T23:30:48Z) - A Lightweight Measure of Classification Difficulty from Application   Dataset Characteristics [4.220363193932374]
We propose an efficient cosine similarity-based classification difficulty measure S.
It is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset.
We show how a practitioner can use this measure to help select an efficient model 6 to 29x faster than through repeated training and testing.
arXiv  Detail & Related papers  (2024-04-09T03:27:09Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency   Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv  Detail & Related papers  (2024-04-04T17:58:02Z) - On the Out of Distribution Robustness of Foundation Models in Medical
  Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv  Detail & Related papers  (2023-11-18T14:52:10Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
  General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv  Detail & Related papers  (2023-10-26T17:59:46Z) - A study on the impact of pre-trained model on Just-In-Time defect
  prediction [10.205110163570502]
We build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone.
We investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution.
arXiv  Detail & Related papers  (2023-09-05T15:34:22Z) - A Critical Look at the Current Usage of Foundation Model for Dense
  Recognition Task [26.938332354370814]
Large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields.
It is still unclear whether those foundation models can be applied to other different downstream tasks.
arXiv  Detail & Related papers  (2023-07-06T08:57:53Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv  Detail & Related papers  (2023-05-18T16:28:29Z) - Imputing Knowledge Tracing Data with Subject-Based Training via LSTM
  Variational Autoencoders Frameworks [6.24828623162058]
We adopt a subject-based training method to split and impute data by student IDs instead of row number splitting.
We leverage two existing deep generative frameworks, namely variational Autoencoders (VAE) and Longitudinal Variational Autoencoders (LVAE)
We demonstrate that the generated data from LSTM-VAE and LSTM-LVAE can boost the original model performance by about 50%.
arXiv  Detail & Related papers  (2023-02-24T21:56:03Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
  Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv  Detail & Related papers  (2022-10-28T17:52:10Z) - Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face
  Recognition [107.58227666024791]
Face recognition systems are widely deployed in safety-critical applications, including law enforcement.
They exhibit bias across a range of socio-demographic dimensions, such as gender and race.
Previous works on bias mitigation largely focused on pre-processing the training data.
arXiv  Detail & Related papers  (2022-10-18T15:46:05Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
  Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv  Detail & Related papers  (2022-10-11T10:20:31Z) - No One Representation to Rule Them All: Overlapping Features of Training
  Methods [12.58238785151714]
High-performing models tend to make similar predictions regardless of training methodology.
Recent work has made very different training techniques, such as large-scale contrastive learning, yield competitively-high accuracy.
We show these models specialize in generalization of the data, leading to higher ensemble performance.
arXiv  Detail & Related papers  (2021-10-20T21:29:49Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
  Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv  Detail & Related papers  (2020-12-08T18:03:21Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.