Statistical optimality conditions for compressive ensembles
- URL: http://arxiv.org/abs/2106.01092v1
- Date: Wed, 2 Jun 2021 11:52:31 GMT
- Title: Statistical optimality conditions for compressive ensembles
- Authors: Henry W. J. Reeve, Ata Kaban
- Abstract summary: We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data.
We introduce a general distribution-dependent upper-bound on the excess risk, framed in terms of a natural notion of compressibility.
We then instantiate this general bound to classification and regression tasks, considering Johnson-Lindenstrauss mappings as the compression scheme.
- Score: 7.766921168069532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a framework for the theoretical analysis of ensembles of
low-complexity empirical risk minimisers trained on independent random
compressions of high-dimensional data. First we introduce a general
distribution-dependent upper-bound on the excess risk, framed in terms of a
natural notion of compressibility. This bound is independent of the dimension
of the original data representation, and explains the in-built regularisation
effect of the compressive approach. We then instantiate this general bound to
classification and regression tasks, considering Johnson-Lindenstrauss mappings
as the compression scheme. For each of these tasks, our strategy is to develop
a tight upper bound on the compressibility function, and by doing so we
discover distributional conditions of geometric nature under which the
compressive algorithm attains minimax-optimal rates up to at most
poly-logarithmic factors. In the case of compressive classification, this is
achieved with a mild geometric margin condition along with a flexible moment
condition that is significantly more general than the assumption of bounded
domain. In the case of regression with strongly convex smooth loss functions we
find that compressive regression is capable of exploiting spectral decay with
near-optimal guarantees. In addition, a key ingredient for our central upper
bound is a high probability uniform upper bound on the integrated deviation of
dependent empirical processes, which may be of independent interest.
Related papers
- Problem-dependent convergence bounds for randomized linear gradient compression [4.656302602746228]
In distributed optimization, the communication model updates can be a performance bottleneck.
gradient compression has been proposed as a means of increasing optimization.
We study how the impact of compression on throughput can be in terms of the norm of the Hessian objective.
arXiv Detail & Related papers (2024-11-19T22:26:42Z) - Blessing of Dimensionality for Approximating Sobolev Classes on Manifolds [14.183849746284816]
The manifold hypothesis says that natural high-dimensional data is supported on or around a low-dimensional manifold.
Recent success of statistical and learning-based methods empirically supports this hypothesis.
We provide theoretical statistical complexity results, which directly relates to generalization properties.
arXiv Detail & Related papers (2024-08-13T15:56:42Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Distribution Regression with Sliced Wasserstein Kernels [45.916342378789174]
We propose the first OT-based estimator for distribution regression.
We study the theoretical properties of a kernel ridge regression estimator based on such representation.
arXiv Detail & Related papers (2022-02-08T15:21:56Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - On dissipative symplectic integration with applications to
gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically.
We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.