Sample Complexity of Probability Divergences under Group Symmetry
- URL: http://arxiv.org/abs/2302.01915v3
- Date: Fri, 22 Nov 2024 19:10:46 GMT
- Title: Sample Complexity of Probability Divergences under Group Symmetry
- Authors: Ziyu Chen, Markos A. Katsoulakis, Luc Rey-Bellet, Wei Zhu,
- Abstract summary: We rigorously quantify the improvement in the sample complexity of variational divergence estimations for group-invariant distributions.
In the cases of the Wasserstein-1 metric and the Lipschitz-regularized $alpha$-divergences, the reduction of sample complexity is proportional to the group size if the group is finite.
- Score: 13.084804346845816
- License:
- Abstract: We rigorously quantify the improvement in the sample complexity of variational divergence estimations for group-invariant distributions. In the cases of the Wasserstein-1 metric and the Lipschitz-regularized $\alpha$-divergences, the reduction of sample complexity is proportional to the group size if the group is finite. In addition to the published version at ICML 2023, our proof indeed has included the case when the group is infinite such as compact Lie groups, the convergence rate can be further improved and depends on the intrinsic dimension of the fundamental domain characterized by the scaling of its covering number. Our approach is different from that in [Tahmasebi & Jegelka, ICML 2024] and our work also applies to asymmetric divergences, such as the Lipschitz-regularized $\alpha$-divergences. For the maximum mean discrepancy (MMD), the improvement of sample complexity is more nuanced, as it depends on not only the group size but also the choice of kernel. Numerical simulations verify our theories.
Related papers
- Precise Asymptotics of Bagging Regularized M-estimators [5.165142221427928]
We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators.
Key to our analysis is a new result on the joint behavior of correlations between the estimator and residual errors on overlapping subsamples.
Joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data.
arXiv Detail & Related papers (2024-09-23T17:48:28Z) - Pseudorandomness from Subset States [0.34476492531570746]
We show it is possible to obtain quantum pseudorandomness and pseudoentanglement from random subset states.
We show that the trace distance is negligibly small, as long as the subsets are of an appropriate size.
arXiv Detail & Related papers (2023-12-14T18:36:16Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Sample Complexity Bounds for Estimating Probability Divergences under Invariances [31.946304450935628]
Group-invariant probability distributions appear in many data-generative models in machine learning.
In this work, we study how the inherent invariances, with respect to any smooth action of a Lie group on a manifold, improve sample complexity.
Results are completely new for groups of positive dimension and extend recent bounds for finite group actions.
arXiv Detail & Related papers (2023-11-06T04:45:21Z) - The Exact Sample Complexity Gain from Invariances for Kernel Regression [37.74032673086741]
In practice, encoding invariances into models improves sample complexity.
We provide minimax optimal rates for kernel ridge regression on compact manifold.
Our results hold for any smooth compact Lie group action, even groups of positive dimension.
arXiv Detail & Related papers (2023-03-24T20:47:31Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z) - On the Sample Complexity of Learning with Geometric Stability [42.813141600050166]
We study the sample complexity of learning problems where the target function presents such invariance and stability properties.
We provide non-parametric rates of convergence for kernel methods, and improvements in sample complexity by a factor equal to the size of the group.
arXiv Detail & Related papers (2021-06-14T03:51:16Z) - Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures.
We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components.
The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.