The Kernel Density Integral Transformation
- URL: http://arxiv.org/abs/2309.10194v2
- Date: Thu, 19 Oct 2023 15:49:44 GMT
- Title: The Kernel Density Integral Transformation
- Authors: Calvin McCarter
- Abstract summary: We propose the use of the kernel density integral transformation as a feature preprocessing step.
Our approach subsumes the two leading feature preprocessing methods as limiting cases.
We show that the kernel density transformation can be profitably applied to statistical data analysis.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Feature preprocessing continues to play a critical role when applying machine
learning and statistical methods to tabular data. In this paper, we propose the
use of the kernel density integral transformation as a feature preprocessing
step. Our approach subsumes the two leading feature preprocessing methods as
limiting cases: linear min-max scaling and quantile transformation. We
demonstrate that, without hyperparameter tuning, the kernel density integral
transformation can be used as a simple drop-in replacement for either method,
offering protection from the weaknesses of each. Alternatively, with tuning of
a single continuous hyperparameter, we frequently outperform both of these
methods. Finally, we show that the kernel density transformation can be
profitably applied to statistical data analysis, particularly in correlation
analysis and univariate clustering.
Related papers
- Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues.
We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution.
Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z) - Fast Dual Subgradient Optimization of the Integrated Transportation
Distance Between Stochastic Kernels [1.5229257192293204]
A generalization of the Wasserstein metric, the integrated transportation distance, establishes a novel distance between probability kernels of Markov systems.
This metric serves as the foundation for an efficient approximation technique, enabling the replacement of the original system's kernel with a kernel with a discrete support of limited cardinality.
We present a specialized dual algorithm capable of constructing these approximate kernels quickly and efficiently, without requiring computationally expensive matrix operations.
arXiv Detail & Related papers (2023-12-03T15:44:17Z) - Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Robust scalable initialization for Bayesian variational inference with
multi-modal Laplace approximations [0.0]
Variational mixtures with full-covariance structures suffer from a quadratic growth due to variational parameters with the number of parameters.
We propose a method for constructing an initial Gaussian model approximation that can be used to warm-start variational inference.
arXiv Detail & Related papers (2023-07-12T19:30:04Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z) - Optimizing generalization on the train set: a novel gradient-based
framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning.
We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z) - Inverse Learning of Symmetries [71.62109774068064]
We learn the symmetry transformation with a model consisting of two latent subspaces.
Our approach is based on the deep information bottleneck in combination with a continuous mutual information regulariser.
Our model outperforms state-of-the-art methods on artificial and molecular datasets.
arXiv Detail & Related papers (2020-02-07T13:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.