DCDILP: a distributed learning method for large-scale causal structure learning
- URL: http://arxiv.org/abs/2406.10481v1
- Date: Sat, 15 Jun 2024 03:17:48 GMT
- Title: DCDILP: a distributed learning method for large-scale causal structure learning
- Authors: Shuyu Dong, Michèle Sebag, Kento Uemura, Akito Fujii, Shuang Chang, Yusuke Koyanagi, Koji Maruhashi,
- Abstract summary: This paper presents a novel approach to causal discovery through a divide-and-conquer framework.
By decomposing the problem into smaller subproblems defined on Markov blankets, the proposed DCDILP method first explores in parallel the local causal graphs of these subproblems.
- Score: 4.551283926017506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel approach to causal discovery through a divide-and-conquer framework. By decomposing the problem into smaller subproblems defined on Markov blankets, the proposed DCDILP method first explores in parallel the local causal graphs of these subproblems. However, this local discovery phase encounters systematic challenges due to the presence of hidden confounders (variables within each Markov blanket may be influenced by external variables). Moreover, aggregating these local causal graphs in a consistent global graph defines a large size combinatorial optimization problem. DCDILP addresses these challenges by: i) restricting the local subgraphs to causal links only related with the central variable of the Markov blanket; ii) formulating the reconciliation of local causal graphs as an integer linear programming method. The merits of the approach, in both terms of causal discovery accuracy and scalability in the size of the problem, are showcased by experiments and comparisons with the state of the art.
Related papers
- Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.
Non-smooth regularization is often incorporated into machine learning tasks.
We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Learning Directed Acyclic Graphs from Partial Orderings [9.387234607473054]
directed acyclic graphs (DAGs) are commonly used to model causal relationships among random variables.
In this paper, we consider the intermediate problem of learning DAGs when a partial causal ordering of variables is available.
We propose a general estimation framework for leveraging the partial ordering and present efficient estimation algorithms for low- and high-dimensional problems.
arXiv Detail & Related papers (2024-03-24T06:14:50Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Gauge-optimal approximate learning for small data classification
problems [0.0]
Small data learning problems are characterized by a discrepancy between the limited amount of response variable observations and the large feature space dimension.
We propose the Gauge- Optimal Approximate Learning (GOAL) algorithm, which provides an analytically tractable joint solution to the reduction dimension, feature segmentation and classification problems.
GOAL has been compared to other state-of-the-art machine learning (ML) tools on both synthetic data and challenging real-world applications from climate science and bioinformatics.
arXiv Detail & Related papers (2023-10-29T16:46:05Z) - Joint Learning of Label and Environment Causal Independence for Graph
Out-of-Distribution Generalization [60.4169201192582]
We propose to incorporate label and environment causal independence (LECI) to fully make use of label and environment information.
LECI significantly outperforms prior methods on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-01T19:33:30Z) - Self-Supervised Learning with an Information Maximization Criterion [5.214806886230471]
We argue that a straightforward application of information among alternative representations of the same input naturally solves the collapse problem.
We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure.
Numerical experiments demonstrate that CorInfoMax achieves better or competitive performance results relative to the state-of-the-art SSL approaches.
arXiv Detail & Related papers (2022-09-16T15:26:19Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Relation Matters: Foreground-aware Graph-based Relational Reasoning for
Domain Adaptive Object Detection [81.07378219410182]
We propose a new and general framework for DomainD, named Foreground-aware Graph-based Reasoning (FGRR)
FGRR incorporates graph structures into the detection pipeline to explicitly model the intra- and inter-domain foreground object relations.
Empirical results demonstrate that the proposed FGRR exceeds the state-of-the-art on four DomainD benchmarks.
arXiv Detail & Related papers (2022-06-06T05:12:48Z) - A Probabilistic Graph Coupling View of Dimension Reduction [5.35952718937799]
We introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy.
We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs.
Our model is leveraged and extended to address the issue while new links are drawn with Laplacian eigenmaps and PCA.
arXiv Detail & Related papers (2022-01-31T08:21:55Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Multi-task Learning of Order-Consistent Causal Graphs [59.9575145128345]
We consider the problem of discovering $K related Gaussian acyclic graphs (DAGs)
Under multi-task learning setting, we propose a $l_1/l$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models.
We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order.
arXiv Detail & Related papers (2021-11-03T22:10:18Z) - On the Difficulty of Generalizing Reinforcement Learning Framework for
Combinatorial Optimization [6.935838847004389]
Combinatorial optimization problems (COPs) on the graph with real-life applications are canonical challenges in Computer Science.
The underlying principle of this approach is to deploy a graph neural network (GNN) for encoding both the local information of the nodes and the graph-structured data.
We use the security-aware phone clone allocation in the cloud as a classical quadratic assignment problem (QAP) to investigate whether or not deep RL-based model is generally applicable to solve other classes of such hard problems.
arXiv Detail & Related papers (2021-08-08T19:12:04Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - A Local Method for Identifying Causal Relations under Markov Equivalence [7.904790547594697]
Causality is important for designing interpretable and robust methods in artificial intelligence research.
We propose a local approach to identify whether a variable is a cause of a given target based on causal graphical models of directed acyclic graphs (DAGs)
arXiv Detail & Related papers (2021-02-25T05:01:44Z) - Disentangling Observed Causal Effects from Latent Confounders using
Method of Moments [67.27068846108047]
We provide guarantees on identifiability and learnability under mild assumptions.
We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions.
arXiv Detail & Related papers (2021-01-17T07:48:45Z) - Efficient Consensus Model based on Proximal Gradient Method applied to
Convolutional Sparse Problems [2.335152769484957]
We derive and detail a theoretical analysis of an efficient consensus algorithm based on gradient proximal (PG) approach.
The proposed algorithm is also applied to another particular convolutional problem for the anomaly detection task.
arXiv Detail & Related papers (2020-11-19T20:52:48Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data.
We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z) - Stochastic gradient algorithms from ODE splitting perspective [0.0]
We present a different view on optimization, which goes back to the splitting schemes for approximate solutions of ODE.
In this work, we provide a connection between descent approach and gradient first-order splitting scheme for ODE.
We consider the special case of splitting, which is inspired by machine learning applications and derive a new upper bound on the global splitting error for it.
arXiv Detail & Related papers (2020-04-19T22:45:32Z) - Revisiting SGD with Increasingly Weighted Averaging: Optimization and
Generalization Perspectives [50.12802772165797]
The averaging technique combines all iterative solutions into a single solution.
Experiments have demonstrated trade-off and the effectiveness of averaging compared with other averaging schemes.
arXiv Detail & Related papers (2020-03-09T18:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.