Nonlinear Sufficient Dimension Reduction for
Distribution-on-Distribution Regression
- URL: http://arxiv.org/abs/2207.04613v2
- Date: Tue, 25 Apr 2023 03:04:59 GMT
- Title: Nonlinear Sufficient Dimension Reduction for
Distribution-on-Distribution Regression
- Authors: Qi Zhang, Bing Li, and Lingzhou Xue
- Abstract summary: We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data.
Our key step is to build universal kernels (cc-universal) on the metric spaces.
- Score: 9.086237593805173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new approach to nonlinear sufficient dimension reduction in
cases where both the predictor and the response are distributional data,
modeled as members of a metric space. Our key step is to build universal
kernels (cc-universal) on the metric spaces, which results in reproducing
kernel Hilbert spaces for the predictor and response that are rich enough to
characterize the conditional independence that determines sufficient dimension
reduction. For univariate distributions, we construct the universal kernel
using the Wasserstein distance, while for multivariate distributions, we resort
to the sliced Wasserstein distance. The sliced Wasserstein distance ensures
that the metric space possesses similar topological properties to the
Wasserstein space while also offering significant computation benefits.
Numerical results based on synthetic data show that our method outperforms
possible competing methods. The method is also applied to several data sets,
including fertility and mortality data and Calgary temperature data.
Related papers
- Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models.
We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Computing the Distance between unbalanced Distributions -- The flat
Metric [0.0]
The flat metric generalizes the well-known Wasserstein distance W1 to the case that the distributions are of unequal total mass.
The core of the method is based on a neural network to determine on optimal test function realizing the distance between two measures.
arXiv Detail & Related papers (2023-08-02T09:30:22Z) - On the Size and Approximation Error of Distilled Sets [57.61696480305911]
We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points.
We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data.
A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
arXiv Detail & Related papers (2023-05-23T14:37:43Z) - Wasserstein Archetypal Analysis [9.54262011088777]
Archetypal analysis is an unsupervised machine learning method that summarizes data using a convex polytope.
We consider an alternative formulation of archetypal analysis based on the Wasserstein metric.
arXiv Detail & Related papers (2022-10-25T19:50:09Z) - Tangent Space and Dimension Estimation with the Wasserstein Distance [10.118241139691952]
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space.
We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
arXiv Detail & Related papers (2021-10-12T21:02:06Z) - Dimension Reduction and Data Visualization for Fr\'echet Regression [8.713190936209156]
Fr'echet regression model provides a promising framework for regression analysis with metric spacevalued responses.
We introduce a flexible sufficient dimension reduction (SDR) method for Fr'echet regression to achieve two purposes.
arXiv Detail & Related papers (2021-10-01T15:01:32Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Depth-based pseudo-metrics between probability distributions [1.1470070927586016]
We propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions.
In contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality.
The regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails.
arXiv Detail & Related papers (2021-03-23T17:33:18Z) - Distribution Approximation and Statistical Estimation Guarantees of
Generative Adversarial Networks [82.61546580149427]
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.
This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions with densities in a H"older space.
arXiv Detail & Related papers (2020-02-10T16:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.