How Sensitive are Meta-Learners to Dataset Imbalance?
- URL: http://arxiv.org/abs/2104.05344v1
- Date: Mon, 12 Apr 2021 10:47:42 GMT
- Title: How Sensitive are Meta-Learners to Dataset Imbalance?
- Authors: Mateusz Ochal, Massimiliano Patacchiola, Amos Storkey, Jose Vazquez,
Sen Wang
- Abstract summary: We show that ML methods are more robust against meta-dataset imbalance than imbalance at the task-level.
Overall, these results highlight an implicit strength of ML algorithms, capable of learning generalizable features under dataset imbalance and domain-shift.
- Score: 13.60699610822265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-Learning (ML) has proven to be a useful tool for training Few-Shot
Learning (FSL) algorithms by exposure to batches of tasks sampled from a
meta-dataset. However, the standard training procedure overlooks the dynamic
nature of the real-world where object classes are likely to occur at different
frequencies. While it is generally understood that imbalanced tasks harm the
performance of supervised methods, there is no significant research examining
the impact of imbalanced meta-datasets on the FSL evaluation task. This study
exposes the magnitude and extent of this problem. Our results show that ML
methods are more robust against meta-dataset imbalance than imbalance at the
task-level with a similar imbalance ratio ($\rho<20$), with the effect holding
even in long-tail datasets under a larger imbalance ($\rho=65$). Overall, these
results highlight an implicit strength of ML algorithms, capable of learning
generalizable features under dataset imbalance and domain-shift. The code to
reproduce the experiments is released under an open-source license.
Related papers
- Conformal-in-the-Loop for Learning with Imbalanced Noisy Data [5.69777817429044]
Class imbalance and label noise are pervasive in large-scale datasets.
Much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions.
We propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach.
arXiv Detail & Related papers (2024-11-04T17:09:58Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Class Imbalance in Object Detection: An Experimental Diagnosis and Study
of Mitigation Strategies [0.5439020425818999]
This study introduces a benchmarking framework utilizing the YOLOv5 single-stage detector to address the problem of foreground-foreground class imbalance.
We scrutinized three established techniques: sampling, loss weighing, and data augmentation.
Our comparative analysis reveals that sampling and loss reweighing methods, while shown to be beneficial in two-stage detector settings, do not translate as effectively in improving YOLOv5's performance.
arXiv Detail & Related papers (2024-03-11T19:06:04Z) - Exploring Vision-Language Models for Imbalanced Learning [29.235472353759388]
Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance.
Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data.
arXiv Detail & Related papers (2023-04-04T01:56:16Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced
Datasets [14.739359755029353]
Current semi-supervised learning (SSL) methods assume a balance between the number of data points available for each class in both the labeled and the unlabeled data sets.
We propose BASIL, a novel algorithm that optimize the submodular mutual information (SMI) functions in a per-class fashion to gradually select a balanced dataset in an active learning loop.
arXiv Detail & Related papers (2022-03-10T21:34:08Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - Few-Shot Learning with Class Imbalance [13.60699610822265]
Few-shot learning aims to train models on a limited number of labeled samples given in a support set in order to generalize to unseen samples from a query set.
In the standard setup, the support set contains an equal amount of data points for each class.
We present a detailed study of few-shot class-imbalance along three axes: meta-dataset vs. task imbalance, effect of different imbalance distributions (linear, step, random), and effect of rebalancing techniques.
arXiv Detail & Related papers (2021-01-07T12:54:32Z) - Machine Learning Pipeline for Pulsar Star Dataset [58.720142291102135]
This work brings together some of the most common machine learning (ML) algorithms.
The objective is to make a comparison at the level of obtained results from a set of unbalanced data.
arXiv Detail & Related papers (2020-05-03T23:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.