Constrained Reweighting of Distributions: an Optimal Transport Approach
- URL: http://arxiv.org/abs/2310.12447v2
- Date: Tue, 16 Jan 2024 06:56:51 GMT
- Title: Constrained Reweighting of Distributions: an Optimal Transport Approach
- Authors: Abhisek Chakraborty, Anirban Bhattacharya, Debdeep Pati
- Abstract summary: We introduce a nonparametrically imbued distributional constraints on the weights, and develop a general framework leveraging the maximum entropy principle and tools from optimal transport.
The framework is demonstrated in the context of three disparate applications: portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
- Score: 8.461214317999321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We commonly encounter the problem of identifying an optimally weight adjusted
version of the empirical distribution of observed data, adhering to predefined
constraints on the weights. Such constraints often manifest as restrictions on
the moments, tail behaviour, shapes, number of modes, etc., of the resulting
weight adjusted empirical distribution. In this article, we substantially
enhance the flexibility of such methodology by introducing a nonparametrically
imbued distributional constraints on the weights, and developing a general
framework leveraging the maximum entropy principle and tools from optimal
transport. The key idea is to ensure that the maximum entropy weight adjusted
empirical distribution of the observed data is close to a pre-specified
probability distribution in terms of the optimal transport metric while
allowing for subtle departures. The versatility of the framework is
demonstrated in the context of three disparate applications where data
re-weighting is warranted to satisfy side constraints on the optimization
problem at the heart of the statistical task: namely, portfolio allocation,
semi-parametric inference for complex surveys, and ensuring algorithmic
fairness in machine learning algorithms.
Related papers
- Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [61.580419063416734]
A recent stream of structured learning approaches has improved the practical state of the art for a range of optimization problems.
The key idea is to exploit the statistical distribution over instances instead of dealing with instances separately.
In this article, we investigate methods that smooth the risk by perturbing the policy, which eases optimization and improves the generalization error.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - OTClean: Data Cleaning for Conditional Independence Violations using
Optimal Transport [51.6416022358349]
sys is a framework that harnesses optimal transport theory for data repair under Conditional Independence (CI) constraints.
We develop an iterative algorithm inspired by Sinkhorn's matrix scaling algorithm, which efficiently addresses high-dimensional and large-scale data.
arXiv Detail & Related papers (2024-03-04T18:23:55Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Maximum Weight Entropy [6.821961232645206]
This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods.
Considering neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy.
arXiv Detail & Related papers (2023-09-27T14:46:10Z) - Robust probabilistic inference via a constrained transport metric [8.85031165304586]
We offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions.
The proposed approach finds applications in a wide variety of robust inference problems, where we intend to perform inference on the parameters associated with the centering distribution.
We demonstrate superior performance of our methodology when compared against state-of-the-art robust Bayesian inference methods.
arXiv Detail & Related papers (2023-03-17T16:10:06Z) - Information Theoretical Importance Sampling Clustering [18.248246885248733]
A current assumption of most clustering methods is that the training data and future data are taken from the same distribution.
We propose an information theoretical importance sampling based approach for clustering problems (ITISC)
Experiment results on synthetic datasets and a real-world load forecasting problem validate the effectiveness of the proposed model.
arXiv Detail & Related papers (2023-02-09T03:18:53Z) - Optimal Regularization for a Data Source [8.38093977965175]
It is common to augment criteria that enforce data fidelity with a regularizer that promotes quantity in the solution.
In this paper we seek a systematic understanding of the power and the limitations of convex regularization.
arXiv Detail & Related papers (2022-12-27T20:11:59Z) - Categorical Distributions of Maximum Entropy under Marginal Constraints [0.0]
estimation of categorical distributions under marginal constraints is key for many machine-learning and data-driven approaches.
We provide a parameter-agnostic theoretical framework that ensures that a categorical distribution of Maximum Entropy under marginal constraints always exists.
arXiv Detail & Related papers (2022-04-07T12:42:58Z) - Non-Linear Spectral Dimensionality Reduction Under Uncertainty [107.01839211235583]
We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches.
We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework.
arXiv Detail & Related papers (2022-02-09T19:01:33Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.