Related papers: Tutorial Debriefing: Applied Statistical Causal Inference in Requirements Engineering

Tutorial Debriefing: Applied Statistical Causal Inference in Requirements Engineering

URL: http://arxiv.org/abs/2511.03875v1
Date: Wed, 05 Nov 2025 21:43:53 GMT
Title: Tutorial Debriefing: Applied Statistical Causal Inference in Requirements Engineering
Authors: Julian Frattini, Hans-Martin Heyn, Robert Feldt, Richard Torkar,
Abstract summary: The software engineering (SE) research community strives to contribute to the betterment of the target population of our research: software producers and consumers.<n>We will only achieve this betterment if we manage to transfer the knowledge acquired during research into practice.<n>The value of these contributions hinges on the assumption that applying them causes an improvement of the development process, user experience, or other performance metrics.
Score: 4.29699238971962
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As any scientific discipline, the software engineering (SE) research community strives to contribute to the betterment of the target population of our research: software producers and consumers. We will only achieve this betterment if we manage to transfer the knowledge acquired during research into practice. This transferal of knowledge may come in the form of tools, processes, and guidelines for software developers. However, the value of these contributions hinges on the assumption that applying them causes an improvement of the development process, user experience, or other performance metrics. Such a promise requires evidence of causal relationships between an exposure or intervention (i.e., the contributed tool, process or guideline) and an outcome (i.e., performance metrics). A straight-forward approach to obtaining this evidence is via controlled experiments in which a sample of a population is randomly divided into a group exposed to the new tool, process, or guideline, and a control group. However, such randomized control trials may not be legally, ethically, or logistically feasible. In these cases, we need a reliable process for statistical causal inference (SCI) from observational data.

Related papers

Oops!... I did it again. Conclusion (In-)Stability in Quantitative Empirical Software Engineering: A Large-Scale Analysis [5.94721915761333]
Mining software repositories is a popular means to gain insights into a software project's evolution.<n>This study investigates some threats to validity in complex tool pipelines for evolutionary software analyses.
arXiv Detail & Related papers (2025-10-08T10:11:39Z)
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning [68.89572566071575]
Tool-Integrated Reasoning (TIR) enables large language models (LLMs) to improve their internal reasoning ability by integrating external tools.<n>We propose Tool-Light, a framework designed to encourage LLMs to perform TIR efficiently and accurately.<n> Experimental results on 10 datasets demonstrate the effectiveness of Tool-Light.
arXiv Detail & Related papers (2025-09-27T12:53:37Z)
A Dataset For Computational Reproducibility [2.147712260420443]
This article introduces a dataset of computational experiments covering a broad spectrum of scientific fields.<n>It incorporates details about software dependencies, execution steps, and configurations necessary for accurate reproduction.<n>It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of tools.
arXiv Detail & Related papers (2025-04-11T16:45:10Z)
Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective [59.61868506896214]
We show that under standard data coverage assumptions, reinforcement learning is no more statistically difficult than through process supervision.<n>We prove that any policy's advantage function can serve as an optimal process reward model.
arXiv Detail & Related papers (2025-02-14T22:21:56Z)
Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories [9.539825294372786]
We use two tools to extract and analyse ten large software projects.<n>Despite similar trends, even simple metrics such as the numbers of commits and developers may differ by up to 500%.<n>We find that such substantial differences are often caused by minor technical details.
arXiv Detail & Related papers (2025-01-25T07:42:56Z)
LLM-based Interaction for Content Generation: A Case Study on the Perception of Employees in an IT department [85.1523466539595]
This paper presents a questionnaire survey to identify the intention to use generative tools by employees of an IT company. Our results indicate a rather average acceptability of generative tools, although the more useful the tool is perceived to be, the higher the intention seems to be. Our analyses suggest that the frequency of use of generative tools is likely to be a key factor in understanding how employees perceive these tools in the context of their work.
arXiv Detail & Related papers (2023-04-18T15:35:43Z)
Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent. Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally. We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z)
Scaling up Search Engine Audits: Practical Insights for Algorithm Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z)
Detecting discriminatory risk through data annotation based on Bayesian inferences [5.017973966200985]
We propose a method of data annotation that aims to warn about the risk of discriminatory results of a given data set. We empirically test our system on three datasets commonly accessed by the machine learning community.
arXiv Detail & Related papers (2021-01-27T12:43:42Z)
Open Source Software for Efficient and Transparent Reviews [0.11179881480027788]
ASReview is an open source machine learning-aided pipeline applying active learning. We demonstrate by means of simulation studies that ASReview can yield far more efficient reviewing than manual reviewing.
arXiv Detail & Related papers (2020-06-22T11:57:10Z)
A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics. Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.