Shaky Structures: The Wobbly World of Causal Graphs in Software Analytics
- URL: http://arxiv.org/abs/2505.12554v1
- Date: Sun, 18 May 2025 21:56:42 GMT
- Title: Shaky Structures: The Wobbly World of Causal Graphs in Software Analytics
- Authors: Jeremy Hulse, Nasir U. Eisty, Tim Menzies,
- Abstract summary: Causal graphs are widely used in software engineering to document and explore causal relationships.<n>Though widely used, they may also be wildly misleading.<n>This paper examines causal graphs found by four causal graph generators when applied to 23 data sets.
- Score: 9.935721360249014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal graphs are widely used in software engineering to document and explore causal relationships. Though widely used, they may also be wildly misleading. Causal structures generated from SE data can be highly variable. This instability is so significant that conclusions drawn from one graph may be totally reversed in another, even when both graphs are learned from the same or very similar project data. To document this problem, this paper examines causal graphs found by four causal graph generators (PC, FCI, GES, and LiNGAM) when applied to 23 data sets, relating to three different SE tasks: (a) learning how configuration options are selected for different properties; (b) understanding how management choices affect software projects; and (c) defect prediction. Graphs were compared between (a) different projects exploring the same task; (b) version i and i + 1 of a system; (c) different 90% samples of the data; and (d) small variations in the causal graph generator. Measured in terms of the Jaccard index of the number of edges shared by two different graphs, over half the edges were changed by these treatments. Hence, we conclude two things. Firstly, specific conclusions found by causal graph generators about how two specific variables affect each other may not generalize since those conclusions could be reversed by minor changes in how those graphs are generated. Secondly, before researchers can report supposedly general conclusions from causal graphs (e.g., "long functions cause more defects"), they should test that such conclusions hold over the numerous causal graphs that might be generated from the same data.
Related papers
- Causality and Interpretability for Electrical Distribution System faults [0.0]
We present a new method that combines causal inference with machine learning to classify faults in electrical distribution systems.<n>Our experiments show high accuracy: 99.44% on the EDS fault dataset, which is better than state of art models.
arXiv Detail & Related papers (2025-08-04T15:35:08Z) - Measuring Similarity in Causal Graphs: A Framework for Semantic and Structural Analysis [0.7373617024876725]
Causal graphs are commonly used to understand and model complex systems.<n>Researchers often construct these graphs from different perspectives, leading to significant variations for the same problem.<n>Despite its importance, research on causal graph comparison remains scarce.
arXiv Detail & Related papers (2025-03-14T03:29:26Z) - Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency [48.769884734826974]
We build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph.
The Differentiable Adjacency Test replaces an exponential number of tests with a provably equivalent relaxed problem.
We also build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions.
arXiv Detail & Related papers (2024-06-13T14:39:40Z) - Causal Discovery with Fewer Conditional Independence Tests [15.876392307650248]
Our work focuses on characterizing what can be learned about the underlying causal graph with a reduced number of conditional independence tests.
We show that it is possible to learn a coarser representation of the hidden causal graph with a number of tests.
As a consequence, our results offer the first efficient algorithm for recovering the true causal graph with a number of tests.
arXiv Detail & Related papers (2024-06-03T22:27:09Z) - DAGAD: Data Augmentation for Graph Anomaly Detection [57.92471847260541]
This paper devises a novel Data Augmentation-based Graph Anomaly Detection (DAGAD) framework for attributed graphs.
A series of experiments on three datasets prove that DAGAD outperforms ten state-of-the-art baseline detectors concerning various mostly-used metrics.
arXiv Detail & Related papers (2022-10-18T11:28:21Z) - CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs.
A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed.
We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z) - Variational Graph Generator for Multi-View Graph Clustering [51.89092260088973]
We propose Variational Graph Generator for Multi-View Graph Clustering (VGMGC)<n>This generator infers a reliable variational consensus graph based on a priori assumption over multiple graphs.<n>It embeds the inferred view-common graph and view-specific graphs together with features.
arXiv Detail & Related papers (2022-10-13T13:19:51Z) - Graph Self-supervised Learning with Accurate Discrepancy Learning [64.69095775258164]
We propose a framework that aims to learn the exact discrepancy between the original and the perturbed graphs, coined as Discrepancy-based Self-supervised LeArning (D-SLA)
We validate our method on various graph-related downstream tasks, including molecular property prediction, protein function prediction, and link prediction tasks, on which our model largely outperforms relevant baselines.
arXiv Detail & Related papers (2022-02-07T08:04:59Z) - Inference Attacks Against Graph Neural Networks [33.19531086886817]
Graph embedding is a powerful tool to solve the graph analytics problem.
While sharing graph embedding is intriguing, the associated privacy risks are unexplored.
We systematically investigate the information leakage of the graph embedding by mounting three inference attacks.
arXiv Detail & Related papers (2021-10-06T10:08:11Z) - ExplaGraphs: An Explanation Graph Generation Task for Structured
Commonsense Reasoning [65.15423587105472]
We present a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction.
Specifically, given a belief and an argument, a model has to predict whether the argument supports or counters the belief and also generate a commonsense-augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance.
A significant 83% of our graphs contain external commonsense nodes with diverse structures and reasoning depths.
arXiv Detail & Related papers (2021-04-15T17:51:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.