Related papers: A Refinement of Vapnik--Chervonenkis' Theorem

Related papers

Online Prediction of Stochastic Sequences with High Probability Regret Bounds [16.68585810113338]
We revisit the classical problem of universal prediction of sequences with a finite time horizon $T$ known to the learner.<n>We propose vanishing regret bounds that hold with high probability, complementing existing bounds from the literature that hold in expectation.<n>For the case of universal prediction of a process over a countable alphabet, our bound states a convergence rate of $mathcalO(T-1/2 -1/2)$ with probability as least $1-$ compared to prior known in-expectation bounds of the order $mathcalO(T-1/2)$.
arXiv Detail & Related papers (2026-02-18T07:26:37Z)
Generalizability vs. Counterfactual Explainability Trade-Off [6.3107782051840555]
We introduce the notion of $varepsilon$-valid counterfactual probability ($varepsilon$-VCP)<n>We show that $varepsilon$-VCP tends to increase with model overfitting.
arXiv Detail & Related papers (2025-05-29T08:17:59Z)
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond [40.79840141270367]
Given an unnormalized probability density $piproptomathrme-V$, estimating its normalizing constant $Z=int_mathbbRdmathrme-V(x)mathrmdx$ or free energy $F=-log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning.<n>We propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.
arXiv Detail & Related papers (2025-02-07T00:05:28Z)
A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA) In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z)
Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation [49.78792404811239]
We study inference in simulation-based models where the analytical form of the likelihood is unknown.<n>We use a ratio-free nested multi-time-scale approximation (SA) method that simultaneously tracks the score and drives the parameter update.<n>We show that our algorithm can eliminate the original bias $Obig(sqrtfrac1Nbig)$ and accelerate the convergence rate from $Obig(beta_k+sqrtfracalpha_kNbig)$.
arXiv Detail & Related papers (2024-11-20T02:46:15Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
Robust computation of optimal transport by $\beta$-potential regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z)
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems. We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting. To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z)
Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification [16.0817847880416]
We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space. We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
arXiv Detail & Related papers (2022-01-06T08:51:08Z)
Nearest neighbor empirical processes [7.034466417392574]
An empirical measure based on the responses from the nearest neighbors to a given point $x$ is introduced and studied as a central statistical quantity. A uniform non-asymptotic bound is established under a well-known condition, often referred to as Vapnik-Chervonenkis, on the uniform entropy numbers. This suggests the possibility of using standard formulas to estimate the variance by using only the nearest neighbors instead of the full data.
arXiv Detail & Related papers (2021-10-27T08:15:20Z)
Super fast rates in structured prediction [88.99819200562784]
We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value. We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction. We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
arXiv Detail & Related papers (2021-02-01T10:50:04Z)
Sequential prediction under log-loss and misspecification [47.66467420098395]
We consider the question of sequential prediction under the log-loss in terms of cumulative regret. We show that cumulative regrets in the well-specified and misspecified cases coincideally. We provide an $o(1)$ characterization of the distribution-free or PAC regret.
arXiv Detail & Related papers (2021-01-29T20:28:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.