Minimax Estimation of Conditional Moment Models
- URL: http://arxiv.org/abs/2006.07201v1
- Date: Fri, 12 Jun 2020 14:02:38 GMT
- Title: Minimax Estimation of Conditional Moment Models
- Authors: Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis
- Abstract summary: We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game.
We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces.
We show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates.
- Score: 40.95498063465325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop an approach for estimating models described via conditional moment
restrictions, with a prototypical application being non-parametric instrumental
variable regression. We introduce a min-max criterion function, under which the
estimation problem can be thought of as solving a zero-sum game between a
modeler who is optimizing over the hypothesis space of the target model and an
adversary who identifies violating moments over a test function space. We
analyze the statistical estimation rate of the resulting estimator for
arbitrary hypothesis spaces, with respect to an appropriate analogue of the
mean squared error metric, for ill-posed inverse problems. We show that when
the minimax criterion is regularized with a second moment penalty on the test
function and the test function space is sufficiently rich, then the estimation
rate scales with the critical radius of the hypothesis and test function
spaces, a quantity which typically gives tight fast rates. Our main result
follows from a novel localized Rademacher analysis of statistical learning
problems defined via minimax objectives. We provide applications of our main
results for several hypothesis spaces used in practice such as: reproducing
kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined
via shape constraints, ensemble estimators such as random forests, and neural
networks. For each of these applications we provide computationally efficient
optimization methods for solving the corresponding minimax problem (e.g.
stochastic first-order heuristics for neural networks). In several
applications, we show how our modified mean squared error rate, combined with
conditions that bound the ill-posedness of the inverse problem, lead to mean
squared error rates. We conclude with an extensive experimental analysis of the
proposed methods.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - On High dimensional Poisson models with measurement error: hypothesis
testing for nonlinear nonconvex optimization [13.369004892264146]
We estimation and testing regression model with high dimensionals, which has wide applications in analyzing data.
We propose to estimate regression parameter through minimizing penalized consistency.
The proposed method is applied to the Alzheimer's Disease Initiative.
arXiv Detail & Related papers (2022-12-31T06:58:42Z) - Functional Linear Regression of Cumulative Distribution Functions [20.96177061945288]
We propose functional ridge-regression-based estimation methods that estimate CDFs accurately everywhere.
We show estimation error upper bounds of $widetilde O(sqrtd/n)$ for fixed design, random design, and adversarial context cases.
We formalize infinite dimensional models where the parameter space is an infinite dimensional Hilbert space, and establish a self-normalized estimation error upper bound for this setting.
arXiv Detail & Related papers (2022-05-28T23:59:50Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Nearest Neighbour Based Estimates of Gradients: Sharp Nonasymptotic
Bounds and Applications [0.6445605125467573]
gradient estimation is of crucial importance in statistics and learning theory.
We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted.
We prove nonasymptotic bounds improving upon those obtained for alternative estimation methods.
arXiv Detail & Related papers (2020-06-26T15:19:43Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.