Survey on Imbalanced Data, Representation Learning and SEP Forecasting
        - URL: http://arxiv.org/abs/2310.07598v1
- Date: Wed, 11 Oct 2023 15:38:53 GMT
- Title: Survey on Imbalanced Data, Representation Learning and SEP Forecasting
- Authors: Josias Moukpe
- Abstract summary: Deep Learning methods have significantly advanced various data-driven tasks such as regression, classification, and forecasting.
Much of this progress has been predicated on the strong but often unrealistic assumption that training datasets are balanced with respect to the targets they contain.
This misalignment with real-world conditions, where data is frequently imbalanced, hampers the effectiveness of such models in practical applications.
We present deep learning works that step away from the balanced-data assumption, employing strategies like representation learning to better approximate real-world imbalances.
- Score: 0.9065034043031668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Deep Learning methods have significantly advanced various data-driven tasks
such as regression, classification, and forecasting. However, much of this
progress has been predicated on the strong but often unrealistic assumption
that training datasets are balanced with respect to the targets they contain.
This misalignment with real-world conditions, where data is frequently
imbalanced, hampers the effectiveness of such models in practical applications.
Methods that reconsider that assumption and tackle real-world imbalances have
begun to emerge and explore avenues to address this challenge. One such
promising avenue is representation learning, which enables models to capture
complex data characteristics and generalize better to minority classes. By
focusing on a richer representation of the feature space, these techniques hold
the potential to mitigate the impact of data imbalance. In this survey, we
present deep learning works that step away from the balanced-data assumption,
employing strategies like representation learning to better approximate
real-world imbalances. We also highlight a critical application in SEP
forecasting where addressing data imbalance is paramount for success.
 
      
        Related papers
        - Enhancing Classification with Semi-Supervised Deep Learning Using   Distance-Based Sample Weights [0.0]
 This work proposes a semi-supervised framework that prioritizes training samples based on their proximity to test data.<n> Experiments on twelve benchmark datasets demonstrate significant improvements across key metrics, including accuracy, precision, and recall.<n>This framework provides a robust and practical solution for semi-supervised learning, with potential applications in domains such as healthcare and security.
 arXiv  Detail & Related papers  (2025-05-20T13:29:04Z)
- Restoring balance: principled under/oversampling of data for optimal   classification [0.0]
 Class imbalance in real-world data poses a common bottleneck for machine learning tasks.
 Mitigation strategies, such as under or oversampling the data depending on their abundances, are routinely proposed and tested empirically.
We provide a sharp prediction of the effects of under/oversampling strategies depending on class imbalance, first and second moments of the data, and the metrics of performance considered.
 arXiv  Detail & Related papers  (2024-05-15T17:45:34Z)
- Pessimistic Causal Reinforcement Learning with Mediators for Confounded   Offline Data [17.991833729722288]
 We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
 arXiv  Detail & Related papers  (2024-03-18T14:51:19Z)
- Is augmentation effective to improve prediction in imbalanced text
  datasets? [3.1690891866882236]
 We argue that adjusting the cutoffs without data augmentation can produce similar results to oversampling techniques.
Our findings contribute to a better understanding of the strengths and limitations of different approaches to dealing with imbalanced data.
 arXiv  Detail & Related papers  (2023-04-20T13:07:31Z)
- CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
  Learning [55.733193075728096]
 Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
 arXiv  Detail & Related papers  (2022-02-11T13:49:51Z)
- Self-Damaging Contrastive Learning [92.34124578823977]
 Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
 arXiv  Detail & Related papers  (2021-06-06T00:04:49Z)
- Supercharging Imbalanced Data Learning With Energy-based Contrastive
  Representation Transfer [72.5190560787569]
 In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets.
Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions.
This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes.
 arXiv  Detail & Related papers  (2020-11-25T00:13:11Z)
- Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
 Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
 arXiv  Detail & Related papers  (2020-10-23T19:06:03Z)
- Accurate and Robust Feature Importance Estimation under Distribution
  Shifts [49.58991359544005]
 PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
 arXiv  Detail & Related papers  (2020-09-30T05:29:01Z)
- Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
 We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
 arXiv  Detail & Related papers  (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.