Related papers: Near-optimal delta-convex estimation of Lipschitz functions

Near-optimal delta-convex estimation of Lipschitz functions

URL: http://arxiv.org/abs/2511.15615v1
Date: Wed, 19 Nov 2025 17:02:30 GMT
Title: Near-optimal delta-convex estimation of Lipschitz functions
Authors: Gábor Balázs,
Abstract summary: This paper presents a tractable algorithm for estimating an unknown Lipschitz function from noisy observations.<n>The framework is also straightforward to adapt to shape-restricted regression.
Score: 0.913755431537592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a tractable algorithm for estimating an unknown Lipschitz function from noisy observations and establishes an upper bound on its convergence rate. The approach extends max-affine methods from convex shape-restricted regression to the more general Lipschitz setting. A key component is a nonlinear feature expansion that maps max-affine functions into a subclass of delta-convex functions, which act as universal approximators of Lipschitz functions while preserving their Lipschitz constants. Leveraging this property, the estimator attains the minimax convergence rate (up to logarithmic factors) with respect to the intrinsic dimension of the data under squared loss and subgaussian distributions in the random design setting. The algorithm integrates adaptive partitioning to capture intrinsic dimension, a penalty-based regularization mechanism that removes the need to know the true Lipschitz constant, and a two-stage optimization procedure combining a convex initialization with local refinement. The framework is also straightforward to adapt to convex shape-restricted regression. Experiments demonstrate competitive performance relative to other theoretically justified methods, including nearest-neighbor and kernel-based regressors.

Related papers

Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z)
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z)
Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity [5.866816093792934]
We show that a wide class of continuity algorithms, including the subgradient method, preserve the $mathO convergence rate with constant failure rate. Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of $|x|$ or locally estimated through independent random samples.
arXiv Detail & Related papers (2024-01-25T06:06:31Z)
Resampled Confidence Regions with Exponential Shrinkage for the Regression Function of Binary Classification [0.0]
We build distribution-free confidence regions for the regression function for any user-chosen confidence level and any finite sample size based on a resampling test.<n>We prove the strong uniform consistency of a new empirical risk based approach for model classes with finite pseudo-dimensions and inverse Lipschitz parameterizations.<n>We also consider a k-nearest neighbors based method, for which we prove strong point boundswise on the probability of exclusion.
arXiv Detail & Related papers (2023-08-03T15:52:27Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z)
Convergence rate of the (1+1)-evolution strategy on locally strongly convex functions with lipschitz continuous gradient [10.31411804947731]
Evolution strategy (ES) is one of the promising classes of algorithms for black-box continuous optimization.<n>In this study, an upper bound and a lower bound of the rate of linear convergence of the (1+1)-ES on locally $L$-strongly convex functions with $U$-Lipschitz continuous gradient are derived.
arXiv Detail & Related papers (2022-09-26T07:16:50Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Relative Lipschitzness in Extragradient Methods and a Direct Recipe for Acceleration [30.369542816161378]
We show that standard extragradient methods (i.e. mirror prox and dual extrapolation) recover optimal accelerated rates for first-order minimization of smooth convex functions. We generalize this framework to handle local and randomized notions of relative Lipschitzness.
arXiv Detail & Related papers (2020-11-12T18:43:40Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
Exactly Computing the Local Lipschitz Constant of ReLU Networks [98.43114280459271]
The local Lipschitz constant of a neural network is a useful metric for robustness, generalization, and fairness evaluation. We show strong inapproximability results for estimating Lipschitz constants of ReLU networks. We leverage this algorithm to evaluate the tightness of competing Lipschitz estimators and the effects of regularized training on the Lipschitz constant.
arXiv Detail & Related papers (2020-03-02T22:15:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.