Related papers: Class Imbalance in Anomaly Detection: Learning from an Exactly Solvable Model

Class Imbalance in Anomaly Detection: Learning from an Exactly Solvable Model

URL: http://arxiv.org/abs/2501.11638v2
Date: Tue, 05 Aug 2025 09:33:59 GMT
Title: Class Imbalance in Anomaly Detection: Learning from an Exactly Solvable Model
Authors: F. S. Pezzicoli, V. Ros, F. P. Landes, M. Baity-Jesi,
Abstract summary: Class imbalance (CI) is a longstanding problem in machine learning, slowing down training and reducing performances.<n>We provide a theoretical framework to analyze, interpret and address CI.<n>Within this framework, one can distinguish several sources of CI: either intrinsic, train or test imbalance.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Class imbalance (CI) is a longstanding problem in machine learning, slowing down training and reducing performances. Although empirical remedies exist, it is often unclear which ones work best and when, due to the lack of an overarching theory. We address a common case of imbalance, that of anomaly (or outlier) detection. We provide a theoretical framework to analyze, interpret and address CI. It is based on an exact solution of the teacher-student perceptron model, through replica theory. Within this framework, one can distinguish several sources of CI: either intrinsic, train or test imbalance. Our analysis reveals that the optimal train imbalance is generally different from 50%, with a non trivial dependence on the intrinsic imbalance, the abundance of data and on the noise in the learning. Moreover, there is a crossover between a small noise training regime where results are independent of the noise level to a high noise regime where performances quickly degrade with noise. Our results challenge some of the conventional wisdom on CI and offer practical guidelines to address it.

Related papers

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards [2.0987013818856877]
Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training LLMs.<n>In practice, however, the verifier is almost never clean-unit tests probe only limited corner cases.<n>We ask a pragmatic question: does the verification noise merely slow down the learning (rate), or can it flip the outcome (fate)?
arXiv Detail & Related papers (2026-01-07T21:31:26Z)
Conformal-in-the-Loop for Learning with Imbalanced Noisy Data [5.69777817429044]
Class imbalance and label noise are pervasive in large-scale datasets. Much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions. We propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach.
arXiv Detail & Related papers (2024-11-04T17:09:58Z)
Class-Imbalanced Graph Learning without Class Rebalancing [62.1368829847041]
Class imbalance is prevalent in real-world node classification tasks and poses great challenges for graph learning models. In this work, we approach the root cause of class-imbalance bias from an topological paradigm. We devise a lightweight topological augmentation framework BAT to mitigate the class-imbalance bias without class rebalancing.
arXiv Detail & Related papers (2023-08-27T19:01:29Z)
Learning Provably Robust Estimators for Inverse Problems via Jittering [51.467236126126366]
We investigate whether jittering, a simple regularization technique, is effective for learning worst-case robust estimators for inverse problems. We show that jittering significantly enhances the worst-case robustness, but can be suboptimal for inverse problems beyond denoising.
arXiv Detail & Related papers (2023-07-24T14:19:36Z)
Rethinking Class Imbalance in Machine Learning [1.4467794332678536]
Imbalance learning is a subfield of machine learning that focuses on learning tasks in the presence of class imbalance. This study presents a new taxonomy of class imbalance in machine learning with a broader scope. We propose a new logit perturbation-based imbalance learning loss when proportion, variance, and distance imbalances exist simultaneously.
arXiv Detail & Related papers (2023-05-06T02:36:39Z)
Identifying Hard Noise in Long-Tailed Sample Distribution [76.16113794808001]
We introduce Noisy Long-Tailed Classification (NLT) Most de-noising methods fail to identify the hard noises. We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z)
A Theoretical Analysis of the Learning Dynamics under Class Imbalance [0.10231119246773925]
We show that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based trajectory. This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes. We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient.
arXiv Detail & Related papers (2022-07-01T12:54:38Z)
Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning [97.81549071978789]
We propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-04-19T08:23:23Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators. In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z)
Learning From Long-Tailed Data With Noisy Labels [0.0]
Class imbalance and noisy labels are the norm in many large-scale classification datasets. We present a simple two-stage approach based on recent advances in self-supervised learning. We find that self-supervised learning approaches are effectively able to cope with severe class imbalance.
arXiv Detail & Related papers (2021-08-25T07:45:40Z)
Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers. Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.