Redistributor: Transforming Empirical Data Distributions
- URL: http://arxiv.org/abs/2210.14219v2
- Date: Fri, 5 Jul 2024 22:18:53 GMT
- Title: Redistributor: Transforming Empirical Data Distributions
- Authors: Pavol Harar, Dennis Elbrächter, Monika Dörfler, Kory D. Johnson,
- Abstract summary: Redistributor forces a collection of scalar samples to follow a desired distribution.
It produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distribution.
The package is implemented in Python and is optimized to efficiently handle large datasets.
- Score: 1.4936946857731088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable $S$ and the continuous cumulative distribution function of some desired target $T$, it provably produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distribution. As the distribution of $S$ or $T$ may be unknown, we also include algorithms for efficiently estimating these distributions from samples. This allows for various interesting use cases in image processing, where Redistributor serves as a remarkably simple and easy-to-use tool that is capable of producing visually appealing results. For color correction it outperforms other model-based methods and excels in achieving photorealistic style transfer, surpassing deep learning methods in content preservation. The package is implemented in Python and is optimized to efficiently handle large datasets, making it also suitable as a preprocessing step in machine learning. The source code is available at https://github.com/paloha/redistributor.
Related papers
- Revisiting Score Function Estimators for $k$-Subset Sampling [5.464421236280698]
We show how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform.
The resulting estimator provides both exact samples and unbiased gradient estimates.
Experiments in feature selection show results competitive with current methods, despite weaker assumptions.
arXiv Detail & Related papers (2024-07-22T21:26:39Z) - Idempotent Generative Network [61.78905138698094]
We propose a new approach for generative modeling based on training a neural network to be idempotent.
An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application.
We find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold.
arXiv Detail & Related papers (2023-11-02T17:59:55Z) - Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions.
To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits.
In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Unsupervised Learning of Sampling Distributions for Particle Filters [80.6716888175925]
We put forward four methods for learning sampling distributions from observed measurements.
Experiments demonstrate that learned sampling distributions exhibit better performance than designed, minimum-degeneracy sampling distributions.
arXiv Detail & Related papers (2023-02-02T15:50:21Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Perfect Sampling from Pairwise Comparisons [26.396901523831534]
We study how to efficiently obtain perfect samples from a discrete distribution $mathcalD$ given access only to pairwise comparisons of elements of its support.
We design a Markov chain whose stationary distribution coincides with $mathcalD$ and give an algorithm to obtain exact samples using the technique of Coupling from the Past.
arXiv Detail & Related papers (2022-11-23T11:20:30Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Free Lunch for Few-shot Learning: Distribution Calibration [10.474018806591397]
We show that a simple logistic regression classifier trained using the features sampled from our calibrated distribution can outperform the state-of-the-art accuracy on two datasets.
arXiv Detail & Related papers (2021-01-16T07:58:40Z) - Sampling from a $k$-DPP without looking at all items [58.30573872035083]
Given a kernel function and a subset size $k$, our goal is to sample $k$ out of $n$ items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. $k$-DPP)
Existing $k$-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all $n$ items, making it infeasible for large datasets.
We develop an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of $k$ items.
arXiv Detail & Related papers (2020-06-30T16:40:44Z) - Estimates on Learning Rates for Multi-Penalty Distribution Regression [5.999239529678357]
We study a multi-penalty regularization algorithm for distribution regression under the framework of learning theory.
We embed the distributions to reproducing a kernel Hilbert space $mathcalH_K$ associated with Mercer kernel $K$ via mean embedding technique.
The work also derives learning rates for distribution regression in the nonstandard setting $f_rhonotinmathcalH_K$, which is not explored in existing literature.
arXiv Detail & Related papers (2020-06-16T09:31:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.