Imbalanced Continual Learning with Partitioning Reservoir Sampling
- URL: http://arxiv.org/abs/2009.03632v1
- Date: Tue, 8 Sep 2020 10:28:18 GMT
- Title: Imbalanced Continual Learning with Partitioning Reservoir Sampling
- Authors: Chris Dongjoo Kim, Jinseo Jeong, and Gunhee Kim
- Abstract summary: Continual learning from a sequential stream of data is a crucial challenge for machine learning research.
We identify unanticipated adversity innately existent in many multi-label datasets, the long-tailed distribution.
We propose a new sampling strategy for replay-based approach named Partitioning Reservoir Sampling (PRS)
- Score: 46.427023757892336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning from a sequential stream of data is a crucial challenge
for machine learning research. Most studies have been conducted on this topic
under the single-label classification setting along with an assumption of
balanced label distribution. This work expands this research horizon towards
multi-label classification. In doing so, we identify unanticipated adversity
innately existent in many multi-label datasets, the long-tailed distribution.
We jointly address the two independently solved problems, Catastropic
Forgetting and the long-tailed label distribution by first empirically showing
a new challenge of destructive forgetting of the minority concepts on the tail.
Then, we curate two benchmark datasets, COCOseq and NUS-WIDEseq, that allow the
study of both intra- and inter-task imbalances. Lastly, we propose a new
sampling strategy for replay-based approach named Partitioning Reservoir
Sampling (PRS), which allows the model to maintain a balanced knowledge of both
head and tail classes. We publicly release the dataset and the code in our
project page.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Learning from Label Proportions by Learning with Label Noise [30.7933303912474]
Learning from label proportions (LLP) is a weakly supervised classification problem where data points are grouped into bags.
We provide a theoretically grounded approach to LLP based on a reduction to learning with label noise.
Our approach demonstrates improved empirical performance in deep learning scenarios across multiple datasets and architectures.
arXiv Detail & Related papers (2022-03-04T18:52:21Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes.
We propose a new recognition setting, namely semi-supervised long-tailed recognition.
We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z) - Instance Credibility Inference for Few-Shot Learning [45.577880041135785]
Few-shot learning aims to recognize new objects with extremely limited training data for each category.
This paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning.
Our simple approach can establish new state-of-the-arts on four widely used few-shot learning benchmark datasets.
arXiv Detail & Related papers (2020-03-26T12:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.