Related papers: $ψ$DAG: Projected Stochastic Approximation Iteration for DAG Structure Learning

$ψ$DAG: Projected Stochastic Approximation Iteration for DAG Structure Learning

URL: http://arxiv.org/abs/2410.23862v1
Date: Thu, 31 Oct 2024 12:13:11 GMT
Title: $ψ$DAG: Projected Stochastic Approximation Iteration for DAG Structure Learning
Authors: Klea Ziu, Slavomír Hanzely, Loka Li, Kun Zhang, Martin Takáč, Dmitry Kamzolov,
Abstract summary: Learning the structure of Directed A Graphs (DAGs) presents a significant challenge due to the vast search space of possible graphs, which scales with the number of nodes. Recent advancements have redefined this problem as a continuous optimization task by incorporating differentiable a exponentiallyity constraints. We present a novel framework for learning DAGs, employing a Approximation approach integrated with Gradient Descent (SGD)-based optimization techniques.
Score: 6.612096312467342
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Learning the structure of Directed Acyclic Graphs (DAGs) presents a significant challenge due to the vast combinatorial search space of possible graphs, which scales exponentially with the number of nodes. Recent advancements have redefined this problem as a continuous optimization task by incorporating differentiable acyclicity constraints. These methods commonly rely on algebraic characterizations of DAGs, such as matrix exponentials, to enable the use of gradient-based optimization techniques. Despite these innovations, existing methods often face optimization difficulties due to the highly non-convex nature of DAG constraints and the per-iteration computational complexity. In this work, we present a novel framework for learning DAGs, employing a Stochastic Approximation approach integrated with Stochastic Gradient Descent (SGD)-based optimization techniques. Our framework introduces new projection methods tailored to efficiently enforce DAG constraints, ensuring that the algorithm converges to a feasible local minimum. With its low iteration complexity, the proposed method is well-suited for handling large-scale problems with improved computational efficiency. We demonstrate the effectiveness and scalability of our framework through comprehensive experimental evaluations, which confirm its superior performance across various settings.

Related papers

A Triple-Inertial Accelerated Alternating Optimization Method for Deep Learning Training [3.246129789918632]
gradient descent (SGD) algorithm has achieved remarkable success in training deep learning models. alternating minimization (AM) methods have emerged as a promising alternative for model training. We propose a novel Triple-Inertial Accelerated Alternating Minimization (TIAM) framework for neural network training.
arXiv Detail & Related papers (2025-03-11T14:42:17Z)
qNBO: quasi-Newton Meets Bilevel Optimization [26.0555315825777]
Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. We introduce a general framework to address these computational challenges in a coordinated manner. Specifically, we leverage quasi-Newton algorithms to accelerate the resolution of the lower-level problem while efficiently approximating the inverse Hessian-vector product.
arXiv Detail & Related papers (2025-02-03T05:36:45Z)
Non-negative Weighted DAG Structure Learning [12.139158398361868]
We address the problem of learning the true DAGs from nodal observations. We propose a DAG recovery algorithm based on the method that is guaranteed to return ar.
arXiv Detail & Related papers (2024-09-12T09:41:29Z)
A Full Adagrad algorithm with O(Nd) operations [4.389938747401259]
The study offers efficient and practical algorithms for large-scale applications. This innovative strategy significantly reduces the complexity and resource demands typically associated with full-matrix methods.
arXiv Detail & Related papers (2024-05-03T08:02:08Z)
Optimizing NOTEARS Objectives via Topological Swaps [41.18829644248979]
In this paper, we develop an effective method for a set of candidate algorithms. At the inner level, given a subject, we utilize off-the-the-art constraints. The proposed method significantly improves the score of other algorithms.
arXiv Detail & Related papers (2023-05-26T21:49:37Z)
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems. We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
DAGs with No Curl: An Efficient DAG Structure Learning Approach [62.885572432958504]
Recently directed acyclic graph (DAG) structure learning is formulated as a constrained continuous optimization problem with continuous acyclicity constraints. We propose a novel learning framework to model and learn the weighted adjacency matrices in the DAG space directly. We show that our method provides comparable accuracy but better efficiency than baseline DAG structure learning methods on both linear and generalized structural equation models.
arXiv Detail & Related papers (2021-06-14T07:11:36Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
On the Role of Sparsity and DAG Constraints for Learning Linear DAGs [16.97675762810828]
We study the role of sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases. We propose a likelihood-based score function, and show that one only has to apply soft sparsity and DAG constraints to learn a DAG equivalent to the ground truth DAG.
arXiv Detail & Related papers (2020-06-17T23:43:23Z)
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.