Related papers: Statistical optimality conditions for compressive ensembles

Statistical optimality conditions for compressive ensembles

URL: http://arxiv.org/abs/2106.01092v1
Date: Wed, 2 Jun 2021 11:52:31 GMT
Title: Statistical optimality conditions for compressive ensembles
Authors: Henry W. J. Reeve, Ata Kaban
Abstract summary: We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data. We introduce a general distribution-dependent upper-bound on the excess risk, framed in terms of a natural notion of compressibility. We then instantiate this general bound to classification and regression tasks, considering Johnson-Lindenstrauss mappings as the compression scheme.
Score: 7.766921168069532
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data. First we introduce a general distribution-dependent upper-bound on the excess risk, framed in terms of a natural notion of compressibility. This bound is independent of the dimension of the original data representation, and explains the in-built regularisation effect of the compressive approach. We then instantiate this general bound to classification and regression tasks, considering Johnson-Lindenstrauss mappings as the compression scheme. For each of these tasks, our strategy is to develop a tight upper bound on the compressibility function, and by doing so we discover distributional conditions of geometric nature under which the compressive algorithm attains minimax-optimal rates up to at most poly-logarithmic factors. In the case of compressive classification, this is achieved with a mild geometric margin condition along with a flexible moment condition that is significantly more general than the assumption of bounded domain. In the case of regression with strongly convex smooth loss functions we find that compressive regression is capable of exploiting spectral decay with near-optimal guarantees. In addition, a key ingredient for our central upper bound is a high probability uniform upper bound on the integrated deviation of dependent empirical processes, which may be of independent interest.

Related papers

Low-Rank Implicit Neural Representation via Schatten-p Quasi-Norm and Jacobian Regularization [49.158601255093416]
We propose a CP-based low-rank tensor function parameterized by neural networks for implicit neural representation.<n>For smoothness, we propose a regularization term based on the spectral norm of the Jacobian and Hutchinson's trace estimator.<n>Our proposed smoothness regularization is SVD-free and avoids explicit chain rule derivations.
arXiv Detail & Related papers (2025-06-27T11:23:10Z)
Problem-dependent convergence bounds for randomized linear gradient compression [4.656302602746228]
In distributed optimization, the communication model updates can be a performance bottleneck. gradient compression has been proposed as a means of increasing optimization. We study how the impact of compression on throughput can be in terms of the norm of the Hessian objective.
arXiv Detail & Related papers (2024-11-19T22:26:42Z)
Blessing of Dimensionality for Approximating Sobolev Classes on Manifolds [14.183849746284816]
The manifold hypothesis says that natural high-dimensional data is supported on or around a low-dimensional manifold. Recent success of statistical and learning-based methods empirically supports this hypothesis. We provide theoretical statistical complexity results, which directly relates to generalization properties.
arXiv Detail & Related papers (2024-08-13T15:56:42Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z)
Distribution Regression with Sliced Wasserstein Kernels [45.916342378789174]
We propose the first OT-based estimator for distribution regression. We study the theoretical properties of a kernel ridge regression estimator based on such representation.
arXiv Detail & Related papers (2022-02-08T15:21:56Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems. "vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z)
Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits. We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z)
On dissipative symplectic integration with applications to gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically. We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.