On the Pitfalls of Learning with Limited Data: A Facial Expression
Recognition Case Study
- URL: http://arxiv.org/abs/2104.02653v1
- Date: Fri, 2 Apr 2021 18:53:41 GMT
- Title: On the Pitfalls of Learning with Limited Data: A Facial Expression
Recognition Case Study
- Authors: Miguel Rodr\'iguez Santander, Juan Hern\'andez Albarrac\'in, Ad\'in
Ram\'irez Rivera
- Abstract summary: We focus on the problem of Facial Expression Recognition from videos.
We performed an extensive study with four databases at a different complexity and nine deep-learning architectures for video classification.
We found that complex training sets translate better to more stable test sets when trained with transfer learning and synthetically generated data.
- Score: 0.5249805590164901
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep learning models need large amounts of data for training. In video
recognition and classification, significant advances were achieved with the
introduction of new large databases. However, the creation of large-databases
for training is infeasible in several scenarios. Thus, existing or small
collected databases are typically joined and amplified to train these models.
Nevertheless, training neural networks on limited data is not straightforward
and comes with a set of problems. In this paper, we explore the effects of
stacking databases, model initialization, and data amplification techniques
when training with limited data on deep learning models' performance. We
focused on the problem of Facial Expression Recognition from videos. We
performed an extensive study with four databases at a different complexity and
nine deep-learning architectures for video classification. We found that (i)
complex training sets translate better to more stable test sets when trained
with transfer learning and synthetically generated data, but their performance
yields a high variance; (ii) training with more detailed data translates to
more stable performance on novel scenarios (albeit with lower performance);
(iii) merging heterogeneous data is not a straightforward improvement, as the
type of augmentation and initialization is crucial; (iv) classical data
augmentation cannot fill the holes created by joining largely separated
datasets; and (v) inductive biases help to bridge the gap when paired with
synthetic data, but this data is not enough when working with standard
initialization techniques.
Related papers
- Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - On Inductive Biases for Machine Learning in Data Constrained Settings [0.0]
This thesis explores a different answer to the problem of learning expressive models in data constrained settings.
Instead of relying on big datasets to learn neural networks, we will replace some modules by known functions reflecting the structure of the data.
Our approach falls under the hood of "inductive biases", which can be defined as hypothesis on the data at hand restricting the space of models to explore.
arXiv Detail & Related papers (2023-02-21T14:22:01Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Deep Learning on a Data Diet: Finding Important Examples Early in
Training [35.746302913918484]
In vision datasets, simple scores can be used to identify important examples very early in training.
We propose two such scores -- the Gradient Normed (GraNd) and the Error L2-Norm (EL2N)
arXiv Detail & Related papers (2021-07-15T02:12:20Z) - Synthesizing Irreproducibility in Deep Networks [2.28438857884398]
Modern day deep networks suffer from irreproducibility (also referred to as nondeterminism or underspecification)
We show that even with a single nonlinearity and for very simple data and models, irreproducibility occurs.
Model complexity and the choice of nonlinearity also play significant roles in making deep models irreproducible.
arXiv Detail & Related papers (2021-02-21T21:51:28Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Deep transfer learning for improving single-EEG arousal detection [63.52264764099532]
Two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models.
We train a baseline model and replace the first two layers to prepare the architecture for single-channel electroencephalography data.
Using a fine-tuning strategy, our model yields similar performance to the baseline model and was significantly better than a comparable single-channel model.
arXiv Detail & Related papers (2020-04-10T16:51:06Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.