Related papers: Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets

Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets

URL: http://arxiv.org/abs/2206.08802v1
Date: Fri, 17 Jun 2022 14:29:52 GMT
Title: Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets
Authors: Hongxin Wei, Lue Tao, Renchunzi Xie, Lei Feng, Bo An
Abstract summary: Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. Recent studies found that directly training with out-of-distribution data in a semi-supervised manner would harm the generalization performance. We propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset.
Score: 24.551465814633325
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. Recent studies found that directly training with out-of-distribution data (i.e., open-set samples) in a semi-supervised manner would harm the generalization performance. In this work, we theoretically show that out-of-distribution data can still be leveraged to augment the minority classes from a Bayesian perspective. Based on this motivation, we propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset. For each open-set instance, the label is sampled from our pre-defined distribution that is complementary to the distribution of original class priors. We empirically show that Open-sampling not only re-balances the class priors but also encourages the neural network to learn separable representations. Extensive experiments demonstrate that our proposed method significantly outperforms existing data re-balancing methods and can boost the performance of existing state-of-the-art methods.

Related papers

Long-Tailed Learning for Generalized Category Discovery [0.0]
We propose a novel framework that performs generalized category discovery in long-tailed distributions.<n>We first present a self-guided labeling technique that uses a learnable distribution to generate pseudo-labels.<n>We then introduce a representation balancing process to derive discriminative representations.
arXiv Detail & Related papers (2025-06-08T02:01:49Z)
Active Data Sampling and Generation for Bias Remediation [0.0]
A mixed active sampling and data generation strategy -- called samplation -- is proposed to compensate during fine-tuning of a pre-trained classifer the unfair classifications it produces. Using as case study Deep Models for visual semantic role labeling, the proposed method has been able to fully cure a simulated gender bias starting from a 90/10 imbalance.
arXiv Detail & Related papers (2025-03-26T10:42:15Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model. Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data. We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Proposal Distribution Calibration for Few-Shot Object Detection [65.19808035019031]
In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance. Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the RoI head from evolving toward novel classes. We introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head.
arXiv Detail & Related papers (2022-12-15T05:09:11Z)
Imbalanced Classification via Explicit Gradient Learning From Augmented Data [0.0]
We propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances. The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
arXiv Detail & Related papers (2022-02-21T22:16:50Z)
Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders [1.313418334200599]
Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection. We propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset. This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand.
arXiv Detail & Related papers (2021-05-04T06:59:24Z)
Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes. We propose a new recognition setting, namely semi-supervised long-tailed recognition. We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z)
Improved Robustness to Open Set Inputs via Tempered Mixup [37.98372874213471]
We propose a simple regularization technique that improves open set robustness without a background dataset. Our method achieves state-of-the-art results on open set classification baselines and easily scales to large-scale open set classification problems.
arXiv Detail & Related papers (2020-09-10T04:01:31Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Domain Adaptive Bootstrap Aggregating [5.444459446244819]
bootstrap aggregating, or bagging, is a popular method for improving stability of predictive algorithms. This article proposes a domain adaptive bagging method coupled with a new iterative nearest neighbor sampler.
arXiv Detail & Related papers (2020-01-12T20:02:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.