Applying Bayesian Data Analysis for Causal Inference about Requirements Quality: A Controlled Experiment
- URL: http://arxiv.org/abs/2401.01154v3
- Date: Fri, 13 Sep 2024 11:46:33 GMT
- Title: Applying Bayesian Data Analysis for Causal Inference about Requirements Quality: A Controlled Experiment
- Authors: Julian Frattini, Davide Fucci, Richard Torkar, Lloyd Montgomery, Michael Unterkalmsteiner, Jannik Fischbach, Daniel Mendez,
- Abstract summary: It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities.
We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering activity.
- Score: 4.6068376339651635
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering activity that depends on this requirement. We conduct a controlled experiment in which 25 participants from industry and university generate domain models from four natural language requirements containing different quality defects. We evaluate the resulting models using both frequentist and Bayesian data analysis. Contrary to our expectations, our results show that the use of passive voice only has a minor impact on the resulting domain models. The use of ambiguous pronouns, however, shows a strong effect on various properties of the resulting domain models. Most notably, ambiguous pronouns lead to incorrect associations in domain models. Despite being equally advised against by literature and frequentist methods, the Bayesian data analysis shows that the two investigated quality defects have vastly different impacts on software engineering activities and, hence, deserve different levels of attention. Our employed method can be further utilized by researchers to improve reliable, detailed empirical evidence on requirements quality.
Related papers
- Causal Fine-Tuning and Effect Calibration of Non-Causal Predictive Models [1.3124513975412255]
This paper proposes techniques to enhance the performance of non-causal models for causal inference using data from randomized experiments.
In domains like advertising, customer retention, and precision medicine, non-causal models that predict outcomes under no intervention are often used to score individuals and rank them according to the expected effectiveness of an intervention.
arXiv Detail & Related papers (2024-06-13T20:18:16Z) - A Second Look at the Impact of Passive Voice Requirements on Domain
Modeling: Bayesian Reanalysis of an Experiment [4.649794383775257]
We reanalyze the only known controlled experiment investigating the impact of passive voice on the subsequent activity of domain modeling.
Our results reveal that the effects observed by the original authors turned out to be much less significant than previously assumed.
arXiv Detail & Related papers (2024-02-16T16:24:00Z) - Identifying relevant Factors of Requirements Quality: an industrial Case Study [0.5603839226601395]
We conduct a case study considering data from both interview transcripts and issue reports to identify relevant factors of requirements quality.
The results contribute empirical evidence that (1) strengthens existing requirements engineering theories and (2) advances industry-relevant requirements quality research.
arXiv Detail & Related papers (2024-02-01T13:45:06Z) - Large Language Models are Few-Shot Training Example Generators: A Case Study in Fallacy Recognition [49.38757847011105]
computational fallacy recognition faces challenges due to diverse genres, domains, and types of fallacies found in datasets.
We aim to enhance existing models for fallacy recognition by incorporating additional context and by leveraging large language models to generate synthetic data.
Our evaluation results demonstrate consistent improvements across fallacy types, datasets, and generators.
arXiv Detail & Related papers (2023-11-16T04:17:47Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Testing Causality in Scientific Modelling Software [0.26388783516590225]
Causal Testing Framework is a framework that uses Causal Inference techniques to establish causal effects from existing data.
We present three case studies covering real-world scientific models, demonstrating how the Causal Testing Framework can infer metamorphic test outcomes.
arXiv Detail & Related papers (2022-09-01T10:57:54Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - MASSIVE: Tractable and Robust Bayesian Learning of Many-Dimensional
Instrumental Variable Models [8.271859911016719]
We propose a general and efficient causal inference algorithm that accounts for model uncertainty.
We show that, as long as some of the candidates are (close to) valid, without knowing a priori which ones, they collectively still pose enough restrictions on the target interaction to obtain a reliable causal effect estimate.
arXiv Detail & Related papers (2020-12-18T10:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.