Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
- URL: http://arxiv.org/abs/2501.04898v1
- Date: Thu, 09 Jan 2025 01:22:22 GMT
- Title: Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
- Authors: Juno Kim, Dimitri Meunier, Arthur Gretton, Taiji Suzuki, Zhu Li,
- Abstract summary: Deep feature instrumental variable (DFIV) regression is a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks.
We prove that the DFIV algorithm achieves the minimax optimal learning rate when the target structural function lies in a Besov space.
- Score: 57.40108516085593
- License:
- Abstract: We provide a convergence analysis of deep feature instrumental variable (DFIV) regression (Xu et al., 2021), a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. We prove that the DFIV algorithm achieves the minimax optimal learning rate when the target structural function lies in a Besov space. This is shown under standard nonparametric IV assumptions, and an additional smoothness assumption on the regularity of the conditional distribution of the covariate given the instrument, which controls the difficulty of Stage 1. We further demonstrate that DFIV, as a data-adaptive algorithm, is superior to fixed-feature (kernel or sieve) IV methods in two ways. First, when the target function possesses low spatial homogeneity (i.e., it has both smooth and spiky/discontinuous regions), DFIV still achieves the optimal rate, while fixed-feature methods are shown to be strictly suboptimal. Second, comparing with kernel-based two-stage regression estimators, DFIV is provably more data efficient in the Stage 1 samples.
Related papers
- Tensor-Var: Variational Data Assimilation in Tensor Product Feature Space [30.63086465547801]
Variational data assimilation estimates the dynamical system states by minimizing a cost function that fits the numerical models with observational data.
The widely used method, four-dimensionalal assimilation (4D-Var), has two primary challenges: (1) computationally demanding for complex nonlinear systems and (2) relying on state-observation mappings, which are often not perfectly known.
Deep learning (DL) has been used as a more expressive class of efficient model approximators to address these challenges.
In this paper, we propose Conditional Mean Embedding (CME) to address these challenges using kernel Conditional-Var.
arXiv Detail & Related papers (2025-01-23T01:43:31Z) - Nonparametric Instrumental Regression via Kernel Methods is Minimax Optimal [28.361133177290657]
We study the kernel instrumental variable algorithm of citetsingh 2019.
We show that the kernel NPIV estimator converges to the IV solution with minimum norm.
We also improve the original kernel NPIV algorithm by adopting a general spectral regularization in stage 1 regression.
arXiv Detail & Related papers (2024-11-29T12:18:01Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Learning Decision Policies with Instrumental Variables through Double Machine Learning [16.842233444365764]
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset.
We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions.
It outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
arXiv Detail & Related papers (2024-05-14T10:55:04Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Regularized DeepIV with Model Selection [72.17508967124081]
Regularized DeepIV (RDIV) regression can converge to the least-norm IV solution.
Our method matches the current state-of-the-art convergence rate.
arXiv Detail & Related papers (2024-03-07T05:38:56Z) - Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients [0.3277163122167434]
We show how to formulate a functional gradient descent algorithm to tackle NPIV regression by directly minimizing the populational risk.
We provide theoretical support in the form of bounds on the excess risk, and conduct numerical experiments showcasing our method's superior stability and competitive performance.
This algorithm enables flexible estimator choices, such as neural networks or kernel based methods, as well as non-quadratic loss functions.
arXiv Detail & Related papers (2024-02-08T12:50:38Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z) - Learning Deep Features in Instrumental Variable Regression [42.085253974990046]
In IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument.
We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear.
arXiv Detail & Related papers (2020-10-14T15:14:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.