Related papers: Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

URL: http://arxiv.org/abs/2511.14710v1
Date: Tue, 18 Nov 2025 17:51:17 GMT
Title: Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization
Authors: Zonghao Chen, Atsushi Nitanda, Arthur Gretton, Taiji Suzuki,
Abstract summary: We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV)<n>This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD)
Score: 66.08522228989634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV). This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD), unlike standard MFLD, however, our setting of 2SLS entails a \emph{bilevel} optimization problem in the space of probability measures. To address this challenge, we leverage the penalty gradient approach recently developed for bilevel optimization which formulates bilevel optimization as a Lagrangian problem. This leads to a novel fully first-order algorithm, termed \texttt{F$^2$BMLD}. Apart from the convergence bound, we further provide a generalization bound, revealing an inherent trade-off in the choice of the Lagrange multiplier between optimization and statistical guarantees. Finally, we empirically validate the effectiveness of the proposed method on an offline reinforcement learning benchmark.

Related papers

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning [17.65459083031186]
We train dynamics of gradient descent in a softmax self-attention layer trained to perform linear regression.<n>We show that a simple first-order gradient descent can converge to the globally optimal self-attention parameters.
arXiv Detail & Related papers (2026-03-02T06:44:54Z)
Variational Entropic Optimal Transport [67.76725267984578]
We propose Variational Entropic Optimal Transport (VarEOT) for domain translation problems.<n>VarEOT is based on an exact variational reformulation of the log-partition $log mathbbE[exp(cdot)$ as a tractable generalization over an auxiliary positive normalizer.<n> Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality.
arXiv Detail & Related papers (2026-02-02T15:48:44Z)
A Regularized Actor-Critic Algorithm for Bi-Level Reinforcement Learning [24.969317765059174]
We study a structured bi-level optimization problem where the upper-level objective is a smooth function and the lower-level problem is policy optimization in a Markov decision process (MDP)<n>Existing methods for bi-level optimization and RL often require second-order information, impose strong regularization at the lower level, or inefficiently use samples through nested-loop procedures.<n>We propose a single-loop, first-order actor-critic algorithm that optimize the bi-level objective via a penalty-based reformulation.
arXiv Detail & Related papers (2026-01-23T02:12:24Z)
A Kernel Approach for Semi-implicit Variational Inference [21.789560144560127]
Semi-implicit variational inference (SIVI) enhances the expressiveness of variational families through hierarchical semi-implicit distributions.<n>Recent score-matching approaches to SIVI (SIVI-SM) address this issue via a minimax formulation.<n>We propose kernel semi-implicit variational inference (KSIVI), a principled and tractable alternative.
arXiv Detail & Related papers (2026-01-17T12:06:12Z)
Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
A Gradient Meta-Learning Joint Optimization for Beamforming and Antenna Position in Pinching-Antenna Systems [63.213207442368294]
We consider a novel optimization design for multi-waveguide pinching-antenna systems.<n>The proposed GML-JO algorithm is robust to different choices and better performance compared with the existing optimization methods.
arXiv Detail & Related papers (2025-06-14T17:35:27Z)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [4.554284689395686]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) algorithm in handling certain multi-scale problems.<n>We show that IGD converges to a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z)
Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach [4.577104493960515]
Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle convex optimization over probability measures on a manifold.<n>We show how to extend the MFLD framework to convex optimization problems over signed measures.
arXiv Detail & Related papers (2024-06-24T18:15:12Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.