Can Causality Cure Confusion Caused By Correlation (in Software Analytics)?
- URL: http://arxiv.org/abs/2602.16091v1
- Date: Tue, 17 Feb 2026 23:35:50 GMT
- Title: Can Causality Cure Confusion Caused By Correlation (in Software Analytics)?
- Authors: Amirali Rayegan, Tim Menzies,
- Abstract summary: Symbolic models, particularly decision trees, are widely used in software engineering for explainable analytics.<n>Recent studies in software engineering show that both correlational models and causal discovery algorithms suffer from pronounced instability.<n>This study investigates causality-aware split criteria into symbolic models to improve their stability and robustness.
- Score: 4.082216579462797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Symbolic models, particularly decision trees, are widely used in software engineering for explainable analytics in defect prediction, configuration tuning, and software quality assessment. Most of these models rely on correlational split criteria, such as variance reduction or information gain, which identify statistical associations but cannot imply causation between X and Y. Recent empirical studies in software engineering show that both correlational models and causal discovery algorithms suffer from pronounced instability. This instability arises from two complementary issues: 1-Correlation-based methods conflate association with causation. 2-Causal discovery algorithms rely on heuristic approximations to cope with the NP-hard nature of structure learning, causing their inferred graphs to vary widely under minor input perturbations. Together, these issues undermine trust, reproducibility, and the reliability of explanations in real-world SE tasks. Objective: This study investigates whether incorporating causality-aware split criteria into symbolic models can improve their stability and robustness, and whether such gains come at the cost of predictive or optimization performance. We additionally examine how the stability of human expert judgments compares to that of automated models. Method: Using 120+ multi-objective optimization tasks from the MOOT repository of multi-objective optimization tasks, we evaluate stability through a preregistered bootstrap-ensemble protocol that measures variance with win-score assignments. We compare the stability of human causal assessments with correlation-based decision trees (EZR). We would also compare the causality-aware trees, which leverage conditional-entropy split criteria and confounder filtering. Stability and performance differences are analyzed using statistical methods (variance, Gini Impurity, KS test, Cliff's delta)
Related papers
- STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z) - ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models [102.4511331368587]
ARISE (Adaptive Resolution-aware Scaling Evaluation) is a novel metric designed to assess the test-time scaling effectiveness of large reasoning models.<n>We conduct comprehensive experiments evaluating state-of-the-art reasoning models across diverse domains.
arXiv Detail & Related papers (2025-10-07T15:10:51Z) - Probabilistic causal graphs as categorical data synthesizers: Do they do better than Gaussian Copulas and Conditional Tabular GANs? [0.0]
This study investigates the generation of high-quality synthetic categorical data, such as survey data, using causal graph models.<n>We used the categorical data that are based on the survey of accessibility to services for people with disabilities.<n>We created both SEM and BN models to represent causal relationships and to capture joint distributions between variables.
arXiv Detail & Related papers (2025-04-15T18:41:54Z) - An AI-powered Bayesian generative modeling approach for causal inference in observational studies [4.4876925770439415]
CausalBGM is an AI-powered Bayesian generative modeling approach.<n>It estimates the individual treatment effect (ITE) by learning individual-specific distributions of a low-dimensional latent feature set.
arXiv Detail & Related papers (2025-01-01T06:52:45Z) - A Neural Framework for Generalized Causal Sensitivity Analysis [78.71545648682705]
We propose NeuralCSA, a neural framework for causal sensitivity analysis.
We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest.
arXiv Detail & Related papers (2023-11-27T17:40:02Z) - Towards Causal Analysis of Empirical Software Engineering Data: The
Impact of Programming Languages on Coding Competitions [10.51554436183424]
This paper discusses some novel techniques based on structural causal models.
We apply these ideas to analyzing public data about programmer performance in Code Jam.
We find considerable differences between a purely associational and a causal analysis of the very same data.
arXiv Detail & Related papers (2023-01-18T13:46:16Z) - Causality and Generalizability: Identifiability and Learning Methods [0.0]
This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust prediction methods.
We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization.
We propose a general framework for distributional robustness with respect to intervention-induced distributions.
arXiv Detail & Related papers (2021-10-04T13:12:11Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Disentangling Observed Causal Effects from Latent Confounders using
Method of Moments [67.27068846108047]
We provide guarantees on identifiability and learnability under mild assumptions.
We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions.
arXiv Detail & Related papers (2021-01-17T07:48:45Z) - Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process.
We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z) - The Curse of Performance Instability in Analysis Datasets: Consequences,
Source, and Suggestions [93.62888099134028]
We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading (RC) analysis/stress sets can be highly unstable.
This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets?
We give both theoretical explanations and empirical evidence regarding the source of the instability.
arXiv Detail & Related papers (2020-04-28T15:41:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.