Related papers: A Call for Critically Rethinking and Reforming Data Analysis in Empirical Software Engineering

A Call for Critically Rethinking and Reforming Data Analysis in Empirical Software Engineering

URL: http://arxiv.org/abs/2501.12728v1
Date: Wed, 22 Jan 2025 09:05:01 GMT
Title: A Call for Critically Rethinking and Reforming Data Analysis in Empirical Software Engineering
Authors: Matteo Esposito, Mikel Robredo, Murali Sridharan, Guilherme Horta Travassos, Rafael Peñaloza, Valentina Lenarduzzi,
Abstract summary: Concerns about the correct application of empirical methodologies have existed since the 2006 Dagstuhl seminar on Empirical Software Engineering.<n>We conducted a literature survey of 27,000 empirical studies, using LLMs to classify statistical methodologies as adequate or inadequate.<n>We selected 30 primary studies and held a workshop with 33 ESE experts to assess their ability to identify and resolve statistical issues.
Score: 5.687882380471718
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context: Empirical Software Engineering (ESE) drives innovation in SE through qualitative and quantitative studies. However, concerns about the correct application of empirical methodologies have existed since the 2006 Dagstuhl seminar on SE. Objective: To analyze three decades of SE research, identify mistakes in statistical methods, and evaluate experts' ability to detect and address these issues. Methods: We conducted a literature survey of ~27,000 empirical studies, using LLMs to classify statistical methodologies as adequate or inadequate. Additionally, we selected 30 primary studies and held a workshop with 33 ESE experts to assess their ability to identify and resolve statistical issues. Results: Significant statistical issues were found in the primary studies, and experts showed limited ability to detect and correct these methodological problems, raising concerns about the broader ESE community's proficiency in this area. Conclusions. Despite our study's eventual limitations, its results shed light on recurring issues from promoting information copy-and-paste from past authors' works and the continuous publication of inadequate approaches that promote dubious results and jeopardize the spread of the correct statistical strategies among researchers. Besides, it justifies further investigation into empirical rigor in software engineering to expose these recurring issues and establish a framework for reassessing our field's foundation of statistical methodology application. Therefore, this work calls for critically rethinking and reforming data analysis in empirical software engineering, paving the way for our work soon.

Related papers

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
Applications and Implications of Large Language Models in Qualitative Analysis: A New Frontier for Empirical Software Engineering [0.46426852157920906]
The study emphasizes the need for structured strategies and guidelines to optimize LLM use in qualitative research within software engineering.<n>While LLMs show promise in supporting qualitative analysis, human expertise remains crucial for interpreting data, and ongoing exploration of best practices will be vital for their successful integration into empirical software engineering research.
arXiv Detail & Related papers (2024-12-09T15:17:36Z)
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond [68.47402386668846]
We introduce Structured Reasoning In Critical Text Assessment (STRICTA) to model text assessment as an explicit, step-wise reasoning process.<n>STRICTA breaks down the assessment into a graph of interconnected reasoning steps drawing on causality theory.<n>We apply STRICTA to a dataset of over 4000 reasoning steps from roughly 40 biomedical experts on more than 20 papers.
arXiv Detail & Related papers (2024-09-09T06:55:37Z)
A Comprehensive Survey on Evidential Deep Learning and Its Applications [64.83473301188138]
Evidential Deep Learning (EDL) provides reliable uncertainty estimation with minimal additional computation in a single forward pass. We first delve into the theoretical foundation of EDL, the subjective logic theory, and discuss its distinctions from other uncertainty estimation frameworks. We elaborate on its extensive applications across various machine learning paradigms and downstream tasks.
arXiv Detail & Related papers (2024-09-07T05:55:06Z)
Teaching Software Metrology: The Science of Measurement for Software Engineering [10.23712090082156]
This chapter reviews key concepts in the science of measurement and applies them to software engineering research. A series of exercises for applying important measurement concepts to the reader's research are included.
arXiv Detail & Related papers (2024-06-20T16:57:23Z)
Task-Agnostic Machine-Learning-Assisted Inference [0.0]
We introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference. PSPS provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines.
arXiv Detail & Related papers (2024-05-30T13:19:49Z)
Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings. Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information. This paper presents a thorough analysis of these literature reviews within the PAMI field. We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
Privacy Impact Assessments in the Wild: A Scoping Review [1.7677916783208343]
Privacy Impact Assessments (PIAs) offer a systematic process for assessing the privacy impacts of a project or system. PIAs are heralded as one of the main approaches to privacy by design, supporting the early identification of threats and controls. There is still a significant need for more primary research on the topic, both qualitative and quantitative.
arXiv Detail & Related papers (2024-02-17T05:07:10Z)
A Survey of Contextual Optimization Methods for Decision Making under Uncertainty [47.73071218563257]
This review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks.
arXiv Detail & Related papers (2023-06-17T15:21:02Z)
A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NLP. We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z)
Applications of statistical causal inference in software engineering [2.969705152497174]
This paper reviews existing work in software engineering that applies statistical causal inference methods. Our results show that the application of statistical causal inference methods is relatively recent and that the corresponding research community remains relatively fragmented.
arXiv Detail & Related papers (2022-11-21T14:16:55Z)
Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism. We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
Targeting Learning: Robust Statistics for Reproducible Research [1.1455937444848387]
Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning and statistical theory to help answer scientifically impactful questions with statistical confidence. The roadmap of Targeted Learning emphasizes tailoring statistical procedures so as to minimize their assumptions, carefully grounding them only in the scientific knowledge available.
arXiv Detail & Related papers (2020-06-12T17:17:01Z)
A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics. Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.