Ignoring Time Dependence in Software Engineering Data. A Mistake
- URL: http://arxiv.org/abs/2311.03114v2
- Date: Sun, 12 Nov 2023 13:51:26 GMT
- Title: Ignoring Time Dependence in Software Engineering Data. A Mistake
- Authors: Mikel Robredo and Nyyti Saarimaki and Rafael Penaloza and Valentina
Lenarduzzi
- Abstract summary: We aim to highlight the consequences of neglecting time dependence during data analysis in current research.
We pinpointed out certain potential problems that arise when disregarding the temporal aspect in the data, and support our argument with both theoretical and real examples.
- Score: 4.49517541590633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Researchers often delve into the connections between different factors
derived from the historical data of software projects. For example, scholars
have devoted their endeavors to the exploration of associations among these
factors. However, a significant portion of these studies has failed to consider
the limitations posed by the temporal interdependencies among these variables
and the potential risks associated with the use of statistical methods
ill-suited for analyzing data with temporal connections. Our goal is to
highlight the consequences of neglecting time dependence during data analysis
in current research. We pinpointed out certain potential problems that arise
when disregarding the temporal aspect in the data, and support our argument
with both theoretical and real examples.
Related papers
- Do-PFN: In-Context Learning for Causal Effect Estimation [75.62771416172109]
We show that Prior-data fitted networks (PFNs) can be pre-trained on synthetic data to predict outcomes.<n>Our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.
arXiv Detail & Related papers (2025-06-06T12:43:57Z) - Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z) - MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations [61.59658203704757]
We propose Multi-View Independent Component Analysis with Delays and Dilations (MVICAD2), which allows sources to differ across subjects in both temporal delays and dilations.
We present a model with identifiable sources, derive an approximation of its likelihood in closed form, and use regularization and optimization techniques to enhance performance.
arXiv Detail & Related papers (2025-01-13T15:47:02Z) - On the Identification of Temporally Causal Representation with Instantaneous Dependence [50.14432597910128]
Temporally causal representation learning aims to identify the latent causal process from time series observations.
Most methods require the assumption that the latent causal processes do not have instantaneous relations.
We propose an textbfIDentification framework for instantanetextbfOus textbfLatent dynamics.
arXiv Detail & Related papers (2024-05-24T08:08:05Z) - Causal discovery for time series with constraint-based model and PMIME
measure [0.0]
We present a novel approach for discovering causality in time series data that combines a causal discovery algorithm with an information theoretic-based measure.
We evaluate the performance of our approach on several simulated data sets, showing promising results.
arXiv Detail & Related papers (2023-05-31T09:38:50Z) - A Survey on Causal Discovery Methods for I.I.D. and Time Series Data [4.57769506869942]
Causal Discovery (CD) algorithms can identify the cause-effect relationships among the variables of a system from related observational data.
We present an extensive discussion on the methods designed to perform causal discovery from both independent and identically distributed (I.I.D.) data and time series data.
arXiv Detail & Related papers (2023-03-27T09:21:41Z) - Causal Discovery from Temporal Data: An Overview and New Perspectives [6.251443497694126]
Analyzing temporal data is extremely valuable for various applications.
causal discovery, learning the causal relations from temporal data, is considered an interesting yet critical task.
In this paper, we specify the correlation between the two categories and provide a systematical overview of existing solutions.
arXiv Detail & Related papers (2023-03-17T16:45:01Z) - DOMINO: Visual Causal Reasoning with Time-Dependent Phenomena [59.291745595756346]
We propose a set of visual analytics methods that allow humans to participate in the discovery of causal relations associated with windows of time delay.
Specifically, we leverage a well-established method, logic-based causality, to enable analysts to test the significance of potential causes.
Since an effect can be a cause of other effects, we allow users to aggregate different temporal cause-effect relations found with our method into a visual flow diagram.
arXiv Detail & Related papers (2023-03-12T03:40:21Z) - Understanding the Impact of Competing Events on Heterogeneous Treatment
Effect Estimation from Time-to-Event Data [92.51773744318119]
We study the problem of inferring heterogeneous treatment effects (HTEs) from time-to-event data in the presence of competing events.
We take an outcome modeling approach to estimating HTEs, and consider how and when existing prediction models for time-to-event data can be used as plug-in estimators for potential outcomes.
We theoretically analyze and empirically illustrate when and how these challenges play a role when using generic machine learning prediction models for the estimation of HTEs.
arXiv Detail & Related papers (2023-02-23T14:28:55Z) - Towards Causal Analysis of Empirical Software Engineering Data: The
Impact of Programming Languages on Coding Competitions [10.51554436183424]
This paper discusses some novel techniques based on structural causal models.
We apply these ideas to analyzing public data about programmer performance in Code Jam.
We find considerable differences between a purely associational and a causal analysis of the very same data.
arXiv Detail & Related papers (2023-01-18T13:46:16Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Amortized Causal Discovery: Learning to Infer Causal Graphs from
Time-Series Data [63.15776078733762]
We propose Amortized Causal Discovery, a novel framework to learn to infer causal relations from time-series data.
We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance.
arXiv Detail & Related papers (2020-06-18T19:59:12Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.