On the Convergence of Continuous Constrained Optimization for Structure
Learning
- URL: http://arxiv.org/abs/2011.11150v4
- Date: Sun, 10 Apr 2022 18:31:45 GMT
- Title: On the Convergence of Continuous Constrained Optimization for Structure
Learning
- Authors: Ignavier Ng, S\'ebastien Lachapelle, Nan Rosemary Ke, Simon
Lacoste-Julien, Kun Zhang
- Abstract summary: We show the convergence of the augmented Lagrangian method (ALM) and the quadratic penalty method (QPM) for structure learning in the linear, nonlinear, and confounded cases.
We further establish the convergence guarantee of QPM to a DAG solution, under mild conditions.
- Score: 30.279796192573805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, structure learning of directed acyclic graphs (DAGs) has been
formulated as a continuous optimization problem by leveraging an algebraic
characterization of acyclicity. The constrained problem is solved using the
augmented Lagrangian method (ALM) which is often preferred to the quadratic
penalty method (QPM) by virtue of its standard convergence result that does not
require the penalty coefficient to go to infinity, hence avoiding
ill-conditioning. However, the convergence properties of these methods for
structure learning, including whether they are guaranteed to return a DAG
solution, remain unclear, which might limit their practical applications. In
this work, we examine the convergence of ALM and QPM for structure learning in
the linear, nonlinear, and confounded cases. We show that the standard
convergence result of ALM does not hold in these settings, and demonstrate
empirically that its behavior is akin to that of the QPM which is prone to
ill-conditioning. We further establish the convergence guarantee of QPM to a
DAG solution, under mild conditions. Lastly, we connect our theoretical results
with existing approaches to help resolve the convergence issue, and verify our
findings in light of an empirical comparison of them.
Related papers
- Non-negative Weighted DAG Structure Learning [12.139158398361868]
We address the problem of learning the true DAGs from nodal observations.
We propose a DAG recovery algorithm based on the method that is guaranteed to return ar.
arXiv Detail & Related papers (2024-09-12T09:41:29Z) - Kernel-Based Differentiable Learning of Non-Parametric Directed Acyclic Graphical Models [17.52142371968811]
Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model.
Recent research has sought to bypass the search by reformulating causal discovery as a continuous optimization problem.
arXiv Detail & Related papers (2024-08-20T16:09:40Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - On the Sparse DAG Structure Learning Based on Adaptive Lasso [39.31370830038554]
We develop a data-driven DAG structure learning method without the predefined threshold, called adaptive NOTEARS [30]
We show that adaptive NOTEARS enjoys the oracle properties under some specific conditions. Furthermore, simulation results validate the effectiveness of our method, without setting any gap of edges around zero.
arXiv Detail & Related papers (2022-09-07T05:47:59Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.