Related papers: Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension

Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension

URL: http://arxiv.org/abs/2102.11076v4
Date: Thu, 4 Jul 2024 20:09:15 GMT
Title: Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension
Authors: Rahul Singh,
Abstract summary: I interpret kernel balancing weights as kernel ridge Riesz representers (KRRR) KRRR is an exact generalization of kernel ridge regression. I use KRRR to quantify uncertainty for heterogeneous treatment effects, by age, of 401(k) eligibility on assets.
Score: 2.7152798636894193
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kernel balancing weights provide confidence intervals for average treatment effects, based on the idea of balancing covariates for the treated group and untreated group in feature space, often with ridge regularization. Previous works on the classical kernel ridge balancing weights have certain limitations: (i) not articulating generalization error for the balancing weights, (ii) typically requiring correct specification of features, and (iii) justifying Gaussian approximation for only average effects. I interpret kernel balancing weights as kernel ridge Riesz representers (KRRR) and address these limitations via a new characterization of the counterfactual effective dimension. KRRR is an exact generalization of kernel ridge regression and kernel ridge balancing weights. I prove strong properties similar to kernel ridge regression: population $L_2$ rates controlling generalization error, and a standalone closed form solution that can interpolate. The framework relaxes the stringent assumption that the underlying regression model is correctly specified by the features. It extends Gaussian approximation beyond average effects to heterogeneous effects, justifying confidence sets for causal functions. I use KRRR to quantify uncertainty for heterogeneous treatment effects, by age, of 401(k) eligibility on assets.

Related papers

Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
A Unified Theory for Causal Inference: Direct Debiased Machine Learning via Bregman-Riesz Regression [6.44705221140412]
This note introduces a unified theory for causal inference that integrates Riesz regression, covariate balancing, density-ratio estimation (DRE), and the matching estimator in average treatment effect (ATE) estimation.<n>In ATE estimation, the balancing weights and the regression functions of the outcome play important roles, where the balancing weights are referred to as the Riesz representer.
arXiv Detail & Related papers (2025-10-30T17:56:47Z)
Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective [73.18641268511318]
We propose a graph-based clustering algorithm that only relaxes the orthonormal constraint to derive clustering results.<n>To ensure a doubly constraint into a gradient, we transform the non-negative constraint into a class probability parameter.
arXiv Detail & Related papers (2025-09-23T09:14:39Z)
Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation. We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z)
High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z)
Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning [33.34053480377887]
This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs)
arXiv Detail & Related papers (2024-06-03T15:28:12Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Generalization in Kernel Regression Under Realistic Assumptions [41.345620270267446]
We provide rigorous bounds for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples. Our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression. As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
arXiv Detail & Related papers (2023-12-26T10:55:20Z)
Optimally tackling covariate shift in RKHS-based nonparametric regression [43.457497490211985]
We show that a kernel ridge regression estimator with a carefully chosen regularization parameter is minimax rate-optimal. We also show that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal. We propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios.
arXiv Detail & Related papers (2022-05-06T02:33:24Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
How rotational invariance of common kernels prevents generalization in high dimensions [8.508198765617196]
Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data.
arXiv Detail & Related papers (2021-04-09T08:27:37Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves [26.880628841819004]
We prove uniform consistency with improved finite sample rates via original analysis of generalized kernel ridge regression. We extend our main results to counterfactual distributions and to causal functions identified by front and back door criteria.
arXiv Detail & Related papers (2020-10-10T00:53:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.