Related papers: Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories

Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories

URL: http://arxiv.org/abs/2501.15114v1
Date: Sat, 25 Jan 2025 07:42:56 GMT
Title: Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories
Authors: Nicole Hoess, Carlos Paradis, Rick Kazman, Wolfgang Mauerer,
Abstract summary: We use two tools to extract and analyse ten large software projects.<n>Despite similar trends, even simple metrics such as the numbers of commits and developers may differ by up to 500%.<n>We find that such substantial differences are often caused by minor technical details.
Score: 9.539825294372786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Software repositories are an essential source of information for software engineering research on topics such as project evolution and developer collaboration. Appropriate mining tools and analysis pipelines are therefore an indispensable precondition for many research activities. Ideally, valid results should not depend on technical details of data collection and processing. It is, however, widely acknowledged that mining pipelines are complex, with a multitude of implementation decisions made by tool authors based on their interests and assumptions. This raises the questions if (and to what extent) tools agree on their results and are interchangeable. In this study, we use two tools to extract and analyse ten large software projects, quantitatively and qualitatively comparing results and derived data to better understand this concern. We analyse discrepancies from a technical point of view, and adjust code and parametrisation to minimise replication differences. Our results indicate that despite similar trends, even simple metrics such as the numbers of commits and developers may differ by up to 500%. We find that such substantial differences are often caused by minor technical details. We show how tool-level and data post-processing changes can overcome these issues, but find they may require considerable efforts. We summarise identified causes in our lessons learned to help researchers and practitioners avoid common pitfalls, and reflect on implementation decisions and their influence in ensuring obtained data meets explicit and implicit expectations. Our findings lead us to hypothesise that similar uncertainties exist in other analysis tools, which may limit the validity of conclusions drawn in tool-centric research.

Related papers

Vexed by VEX tools: Consistency evaluation of container vulnerability scanners [0.0]
This paper presents a study that analyzed state-of-the-art vulnerability scanning tools applied to containers. We have focused the work on tools following the Vulnerability Exploitability eXchange (VEX) format.
arXiv Detail & Related papers (2025-03-18T16:22:43Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use. MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
Rethinking Software Misconfigurations in the Real World: An Empirical Study and Literature Analysis [9.88064494257381]
We conduct an empirical study on 823 real-world misconfiguration issues, based on which we propose a novel classification of the root causes of software misconfigurations.<n>We find that the research targets have changed from fundamental software to advanced applications.<n>In the meanwhile, the research on non-crash misconfigurations such as performance degradation and security risks also has a significant growth.
arXiv Detail & Related papers (2024-12-15T08:53:41Z)
Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models. Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions. We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z)
Efficacy of static analysis tools for software defect detection on open-source projects [0.0]
The study used popular analysis tools such as SonarQube, PMD, Checkstyle, and FindBugs to perform the comparison. The study results show that SonarQube performs considerably well than all other tools in terms of its defect detection.
arXiv Detail & Related papers (2024-05-20T19:05:32Z)
What Are Tools Anyway? A Survey from the Language Model Perspective [67.18843218893416]
Language models (LMs) are powerful yet mostly for text generation tasks. We provide a unified definition of tools as external programs used by LMs. We empirically study the efficiency of various tooling methods.
arXiv Detail & Related papers (2024-03-18T17:20:07Z)
Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models [11.388023221294686]
This study investigates bigger large language models (bLLMs) in addressing the labeled data shortage that hampers fine-tuned smaller large language models (sLLMs) in software engineering tasks. We conduct a comprehensive empirical study using five established datasets to assess three open-source bLLMs in zero-shot and few-shot scenarios. Our experimental findings demonstrate that bLLMs exhibit state-of-the-art performance on datasets marked by limited training data and imbalanced distributions.
arXiv Detail & Related papers (2023-10-17T09:53:03Z)
Fingerprinting and Building Large Reproducible Datasets [3.2873782624127843]
We propose a tool-supported approach facilitating the creation of large tailored datasets while ensuring their provenance. We propose a way to define a unique fingerprint to characterize a dataset which, when provided to the extraction process, ensures that the same dataset will be extracted.
arXiv Detail & Related papers (2023-06-20T08:59:33Z)
LLM-based Interaction for Content Generation: A Case Study on the Perception of Employees in an IT department [85.1523466539595]
This paper presents a questionnaire survey to identify the intention to use generative tools by employees of an IT company. Our results indicate a rather average acceptability of generative tools, although the more useful the tool is perceived to be, the higher the intention seems to be. Our analyses suggest that the frequency of use of generative tools is likely to be a key factor in understanding how employees perceive these tools in the context of their work.
arXiv Detail & Related papers (2023-04-18T15:35:43Z)
Regressing Relative Fine-Grained Change for Sub-Groups in Unreliable Heterogeneous Data Through Deep Multi-Task Metric Learning [0.5999777817331317]
We investigate how techniques in multi-task metric learning can be applied for theregression of fine-grained change in real data. The techniques investigated are specifically tailored for handling heterogeneous data sources.
arXiv Detail & Related papers (2022-08-11T12:57:11Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.