DAG Learning on the Permutahedron
- URL: http://arxiv.org/abs/2301.11898v1
- Date: Fri, 27 Jan 2023 18:22:25 GMT
- Title: DAG Learning on the Permutahedron
- Authors: Valentina Zantedeschi, Luca Franceschi, Jean Kaddour, Matt J. Kusner,
Vlad Niculae
- Abstract summary: We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data.
Our approach optimize over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering.
- Score: 33.523216907730216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a continuous optimization framework for discovering a latent
directed acyclic graph (DAG) from observational data. Our approach optimizes
over the polytope of permutation vectors, the so-called Permutahedron, to learn
a topological ordering. Edges can be optimized jointly, or learned conditional
on the ordering via a non-differentiable subroutine. Compared to existing
continuous optimization approaches our formulation has a number of advantages
including: 1. validity: optimizes over exact DAGs as opposed to other
relaxations optimizing approximate DAGs; 2. modularity: accommodates any
edge-optimization procedure, edge structural parameterization, and optimization
loss; 3. end-to-end: either alternately iterates between node-ordering and
edge-optimization, or optimizes them jointly. We demonstrate, on real-world
data problems in protein-signaling and transcriptional network discovery, that
our approach lies on the Pareto frontier of two key metrics, the SID and SHD.
Related papers
- A Continuous Relaxation for Discrete Bayesian Optimization [17.312618575552]
We show that inference and optimization can be computationally tractable.
We consider in particular the optimization domain where very few observations and strict budgets exist.
We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms.
arXiv Detail & Related papers (2024-04-26T14:47:40Z) - SGD with Partial Hessian for Deep Neural Networks Optimization [18.78728272603732]
We propose a compound, which is a combination of a second-order with a precise partial Hessian matrix for updating channel-wise parameters and the first-order gradient descent (SGD) algorithms for updating the other parameters.
Compared with first-orders, it adopts a certain amount of information from the Hessian matrix to assist optimization, while compared with the existing second-order generalizations, it keeps the good performance of first-order generalizations imprecise.
arXiv Detail & Related papers (2024-03-05T06:10:21Z) - SequentialAttention++ for Block Sparsification: Differentiable Pruning
Meets Combinatorial Optimization [24.55623897747344]
Neural network pruning is a key technique towards engineering large yet scalable, interpretable, generalizable models.
We show how many existing differentiable pruning techniques can be understood as non regularization for group sparse optimization.
We propose SequentialAttention++, which advances state the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.
arXiv Detail & Related papers (2024-02-27T21:42:18Z) - ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm, dubbed ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
We present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate algorithms for the considered class of cFCCO problems.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - n-Step Temporal Difference Learning with Optimal n [5.945710235932345]
We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning.
Our objective function for the optimization problem is the average root mean squared error (RMSE)
arXiv Detail & Related papers (2023-03-13T12:44:32Z) - Extrinsic Bayesian Optimizations on Manifolds [1.3477333339913569]
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on Euclid manifold.
Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensionalean space.
This leads to efficient and scalable algorithms for optimization over complex manifold.
arXiv Detail & Related papers (2022-12-21T06:10:12Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - DAGs with No Curl: An Efficient DAG Structure Learning Approach [62.885572432958504]
Recently directed acyclic graph (DAG) structure learning is formulated as a constrained continuous optimization problem with continuous acyclicity constraints.
We propose a novel learning framework to model and learn the weighted adjacency matrices in the DAG space directly.
We show that our method provides comparable accuracy but better efficiency than baseline DAG structure learning methods on both linear and generalized structural equation models.
arXiv Detail & Related papers (2021-06-14T07:11:36Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.