Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning
- URL: http://arxiv.org/abs/2107.05143v2
- Date: Thu, 02 Jan 2025 15:19:28 GMT
- Title: Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning
- Authors: Pierre C Bellec, Yiwei Shen,
- Abstract summary: We study M-estimators with gradient-Lipschitz loss function regularized with convex penalty.<n>A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty.
- Score: 5.064404027153094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $\hat\beta(y,X)$ where differentiation is taken with respect to both $y$ and $X$; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residual $r_i = y_i-x_i^\top\hat\beta$ in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of $\hat\beta(y,X)$ and the effective degrees of freedom of the M-estimator, which are of independent interest.
Related papers
- Divergence-Minimization for Latent-Structure Models: Monotone Operators, Contraction Guarantees, and Robust Inference [5.373905622325275]
We develop a divergence-minimization (DM) framework for robust and efficient inference in latent-mixture models.<n>By optimizing a residual-adjusted divergence, the DM approach recovers EM as a special case and yields robust alternatives.
arXiv Detail & Related papers (2025-11-22T08:25:29Z) - Lower Bounds on the MMSE of Adversarially Inferring Sensitive Features [4.8998185508205765]
We propose an adversarial evaluation framework for sensitive feature inference based on a finite sample size and linear predictive models.<n>Our approach establishes theoretical lower bounds on the true MMSE of inferring sensitive features from noisy observations of other correlated features.
arXiv Detail & Related papers (2025-05-13T22:39:24Z) - Minimax Optimality of the Probability Flow ODE for Diffusion Models [8.15094483029656]
This work develops the first end-to-end theoretical framework for deterministic ODE-based samplers.
We propose a smooth regularized score estimator that simultaneously controls both the $L2$ score error and the associated mean Jacobian error.
We demonstrate that the resulting sampler achieves the minimax rate in total variation distance, modulo logarithmic factors.
arXiv Detail & Related papers (2025-03-12T17:51:29Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set [20.166217494056916]
We propose a principled approach to construct covariance estimators without imposing restrictive assumptions.
We show that our robust estimators are efficiently computable and consistent.
Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators.
arXiv Detail & Related papers (2024-05-30T15:01:18Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Intrinsic Bayesian Cramér-Rao Bound with an Application to Covariance Matrix Estimation [49.67011673289242]
This paper presents a new performance bound for estimation problems where the parameter to estimate lies in a smooth manifold.
It induces a geometry for the parameter manifold, as well as an intrinsic notion of the estimation error measure.
arXiv Detail & Related papers (2023-11-08T15:17:13Z) - On the Variance, Admissibility, and Stability of Empirical Risk
Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates.
We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance.
We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z) - Generalizing Gaussian Smoothing for Random Search [23.381986209234164]
Gaussian smoothing (GS) is a derivative-free optimization algorithm that estimates the gradient of an objective using perturbations of the current benchmarks.
We propose to choose a distribution for perturbations that minimizes the error of such distributions with provably smaller MSE.
arXiv Detail & Related papers (2022-11-27T04:42:05Z) - Ensemble Multi-Quantiles: Adaptively Flexible Distribution Prediction
for Uncertainty Quantification [4.728311759896569]
We propose a novel, succinct, and effective approach for distribution prediction to quantify uncertainty in machine learning.
It incorporates adaptively flexible distribution prediction of $mathbbP(mathbfy|mathbfX=x)$ in regression tasks.
On extensive regression tasks from UCI datasets, we show that EMQ achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-11-26T11:45:32Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.
In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)
For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Achieving Efficiency in Black Box Simulation of Distribution Tails with
Self-structuring Importance Samplers [1.6114012813668934]
The paper presents a novel Importance Sampling (IS) scheme for estimating distribution of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc.
arXiv Detail & Related papers (2021-02-14T03:37:22Z) - M-estimators of scatter with eigenvalue shrinkage [19.82023576081279]
In this paper, a more general approach is considered in which the SCM is replaced by an M-estimator of scatter matrix.
Our approach permits the use of any weight function such as Gaussian, Huber's, or $t$ weight functions.
Our simulation examples illustrate that shrinkage M-estimators based on the proposed optimal tuning combined with robust weight function do not loose in performance to shrinkage SCM estimator when the data is Gaussian.
arXiv Detail & Related papers (2020-02-12T13:47:58Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.