Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency
- URL: http://arxiv.org/abs/2301.06240v1
- Date: Mon, 16 Jan 2023 02:57:37 GMT
- Title: Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency
- Authors: Wenlong Mou, Peng Ding, Martin J. Wainwright, Peter L. Bartlett
- Abstract summary: We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
- Score: 53.90687548731265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study optimal procedures for estimating a linear functional based on
observational data. In many problems of this kind, a widely used assumption is
strict overlap, i.e., uniform boundedness of the importance ratio, which
measures how well the observational data covers the directions of interest.
When it is violated, the classical semi-parametric efficiency bound can easily
become infinite, so that the instance-optimal risk depends on the function
class used to model the regression function. For any convex and symmetric
function class $\mathcal{F}$, we derive a non-asymptotic local minimax bound on
the mean-squared error in estimating a broad class of linear functionals. This
lower bound refines the classical semi-parametric one, and makes connections to
moduli of continuity in functional estimation. When $\mathcal{F}$ is a
reproducing kernel Hilbert space, we prove that this lower bound can be
achieved up to a constant factor by analyzing a computationally simple
regression estimator. We apply our general results to various families of
examples, thereby uncovering a spectrum of rates that interpolate between the
classical theories of semi-parametric efficiency (with $\sqrt{n}$-consistency)
and the slower minimax rates associated with non-parametric function
estimation.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Inference on Time Series Nonparametric Conditional Moment Restrictions
Using General Sieves [4.065100518793487]
This paper considers general nonlinear sieve quasi-likelihood ratio (GN-QLR) based on expectation inferences of time series data.
While the normality of the estimated functionals depends on some unknown Riesz representer of the functional space, we show that the optimally weighted GN-QLR statistic is Chi-square distributed.
arXiv Detail & Related papers (2022-12-31T01:44:17Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Off-policy estimation of linear functionals: Non-asymptotic theory for
semi-parametric efficiency [59.48096489854697]
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.
We prove non-asymptotic upper bounds on the mean-squared error of such procedures.
We establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds.
arXiv Detail & Related papers (2022-09-26T23:50:55Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - Optimal prediction for kernel-based semi-functional linear regression [5.827901300943599]
We establish minimax optimal rates of convergence for prediction in a semi-functional linear model.
Our results reveal that the smoother functional component can be learned with the minimax rate as if the nonparametric component were known.
arXiv Detail & Related papers (2021-10-29T04:55:44Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Equivalence of Convergence Rates of Posterior Distributions and Bayes
Estimators for Functions and Nonparametric Functionals [4.375582647111708]
We study the posterior contraction rates of a Bayesian method with Gaussian process priors in nonparametric regression.
For a general class of kernels, we establish convergence rates of the posterior measure of the regression function and its derivatives.
Our proof shows that, under certain conditions, to any convergence rate of Bayes estimators there corresponds the same convergence rate of the posterior distributions.
arXiv Detail & Related papers (2020-11-27T19:11:56Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.