Asymptotic Inference for Infinitely Imbalanced Logistic Regression
- URL: http://arxiv.org/abs/2204.13231v1
- Date: Wed, 27 Apr 2022 23:52:42 GMT
- Title: Asymptotic Inference for Infinitely Imbalanced Logistic Regression
- Authors: Dorian Goldman, Bo Zhang
- Abstract summary: We show that the variance of the the limiting slope depends exponentially on the z-score of the average of the minority class's points with respect to the majority class's distribution.
We confirm our results by Monte Carlo simulations.
- Score: 4.981260380070016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we extend the work of Owen (2007) by deriving a second order
expansion for the slope parameter in logistic regression, when the size of the
majority class is unbounded and the minority class is finite. More precisely,
we demonstrate that the second order term converges to a normal distribution
and explicitly compute its variance, which surprisingly once again depends only
on the mean of the minority class points and not their arrangement under mild
regularity assumptions. In the case that the majority class is normally
distributed, we illustrate that the variance of the the limiting slope depends
exponentially on the z-score of the average of the minority class's points with
respect to the majority class's distribution. We confirm our results by Monte
Carlo simulations.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach [49.97755400231656]
We show that a novel accelerated DDPM sampler achieves accelerated performance for three broad distribution classes not considered before.
Our results show an improved dependency on the data dimension $d$ among accelerated DDPM type samplers.
arXiv Detail & Related papers (2024-02-21T16:11:47Z) - SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning [49.94607673097326]
We propose a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data.
Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization algorithm.
Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios.
arXiv Detail & Related papers (2024-02-21T03:39:04Z) - Divide-and-Conquer Hard-thresholding Rules in High-dimensional
Imbalanced Classification [1.0312968200748118]
We study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions.
We show that due to data scarcity in one class, referred to as the minority class, the LDA ignores the minority class yielding a maximum misclassification rate.
We propose a new construction of a hard-conquering rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates.
arXiv Detail & Related papers (2021-11-05T07:44:28Z) - Shift Happens: Adjusting Classifiers [2.8682942808330703]
Minimizing expected loss measured by a proper scoring rule, such as Brier score or log-loss (cross-entropy), is a common objective while training a probabilistic classifier.
We propose methods that transform all predictions to (re)equalize the average prediction and the class distribution.
We demonstrate experimentally that, when in practice the class distribution is known only approximately, there is often still a reduction in loss depending on the amount of shift and the precision to which the class distribution is known.
arXiv Detail & Related papers (2021-11-03T21:27:27Z) - Linear Classifiers Under Infinite Imbalance [1.370633147306388]
We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit.
We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite almost sure limit under infinite imbalance.
arXiv Detail & Related papers (2021-06-10T15:01:54Z) - Sharper Sub-Weibull Concentrations: Non-asymptotic Bai-Yin's Theorem [0.0]
Non-asymptotic concentration inequalities play an essential role in the finite-sample theory of machine learning and statistics.
We obtain a sharper and constants-specified concentration inequality for the summation of independent sub-Weibull random variables.
In the application of negative binomial regressions, we gives the $ell$-error with sparse structures, which is a new result for negative binomial regressions.
arXiv Detail & Related papers (2021-02-04T07:16:27Z) - Moment Multicalibration for Uncertainty Estimation [11.734565447730501]
We show how to achieve the notion of "multicalibration" from H'ebert-Johnson et al.
We show that our moment estimates can be used to derive marginal prediction intervals that are simultaneously valid as averaged over all of the (sufficiently large) subgroups for which moment multicalibration has been obtained.
arXiv Detail & Related papers (2020-08-18T17:08:31Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.