Related papers: Identifiability of the minimum-trace directed acyclic graph and hill climbing algorithms without strict local optima under weakly increasing error variances

Identifiability of the minimum-trace directed acyclic graph and hill climbing algorithms without strict local optima under weakly increasing error variances

URL: http://arxiv.org/abs/2508.05706v1
Date: Thu, 07 Aug 2025 04:01:02 GMT
Title: Identifiability of the minimum-trace directed acyclic graph and hill climbing algorithms without strict local optima under weakly increasing error variances
Authors: Hyunwoong Chang, Jaehoan Kim,
Abstract summary: We prove that the true underlying directed acyclic graph (DAG) in Gaussian linear structural equation models is identifiable as the minimum-trace DAG.<n>On the computational side, we prove that the hill climbing algorithm with a random-to-random (R2R) neighborhood does not admit any strict local optima.
Score: 1.1279808969568255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We prove that the true underlying directed acyclic graph (DAG) in Gaussian linear structural equation models is identifiable as the minimum-trace DAG when the error variances are weakly increasing with respect to the true causal ordering. This result bridges two existing frameworks as it extends the identifiable cases within the minimum-trace DAG method and provides a principled interpretation of the algorithmic ordering search approach, revealing that its objective is actually to minimize the total residual sum of squares. On the computational side, we prove that the hill climbing algorithm with a random-to-random (R2R) neighborhood does not admit any strict local optima. Under standard settings, we confirm the result through extensive simulations, observing only a few weak local optima. Interestingly, algorithms using other neighborhoods of equal size exhibit suboptimal behavior, having strict local optima and a substantial number of weak local optima.

Related papers

Zeroth-Order Optimization Finds Flat Minima [51.41529512093436]
We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian.<n>We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions.
arXiv Detail & Related papers (2025-06-05T17:59:09Z)
Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate [14.191310794366075]
This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. We develop a framework that can deal with random corruptions to general objective functions, where the noise model is arbitrary. We characterize the geometry of the spurious local minima of the problem in a local region around ground truth in the case when the RIP constant is greater than $1/3$.
arXiv Detail & Related papers (2022-03-08T07:44:47Z)
Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning. Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z)
The Minimax Complexity of Distributed Optimization [0.0]
I present the "graph oracle model", an extension of the classic oracle framework that can be applied to distributed optimization. I focus on the specific case of the "intermittent communication setting" I analyze the theoretical properties of the popular Local Descent (SGD) algorithm in convex setting.
arXiv Detail & Related papers (2021-09-01T15:18:33Z)
Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures [8.556763944288116]
We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We also consider the algorithmically relevant case of targeting wide flat minima of the mean squared error loss.
arXiv Detail & Related papers (2020-10-27T01:32:03Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks [45.3591788771536]
We re-examine a continuous optimization framework dubbed NOTEARS for learning Bayesian networks. We show that the Karush-Kuhn-Tucker optimality conditions for the NOTEARS cannot be satisfied except in a trivial case. Some combinations with local search are both more accurate and more efficient than the original NOTEARS.
arXiv Detail & Related papers (2020-10-18T22:59:37Z)
Entropic gradient descent algorithms and wide flat minima [6.485776570966397]
We show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions. We extend the analysis to the deep learning scenario by extensive numerical validations. An easy to compute flatness measure shows a clear correlation with test accuracy.
arXiv Detail & Related papers (2020-06-14T13:22:19Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.