Related papers: How is Testing Related to Single Statement Bugs?

How is Testing Related to Single Statement Bugs?

URL: http://arxiv.org/abs/2403.18226v1
Date: Wed, 27 Mar 2024 03:31:00 GMT
Title: How is Testing Related to Single Statement Bugs?
Authors: Habibur Rahman, Saqib Ameen,
Abstract summary: We analyzed data from the top 100 Maven-based projects on GitHub. Our preliminary findings suggest a weak to moderate correlation, indicating that increased test coverage is somewhat reduce the occurrence of SSBs.
Score: 0.25782420501870285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this study, we analyzed the correlation between unit test coverage and the occurrence of Single Statement Bugs (SSBs) in open-source Java projects. We analyzed data from the top 100 Maven-based projects on GitHub, which includes 7824 SSBs. Our preliminary findings suggest a weak to moderate correlation, indicating that increased test coverage is somewhat reduce the occurrence of SSBs. However, this relationship is not very strong, emphasizing the need for better tests. Our study contributes to the ongoing discussion on enhancing software quality and provides a basis for future research into effective testing practices aimed at mitigating SSBs.

Related papers

Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness [53.96714099151378]
We propose a three-step approach for parameter-efficient fine-tuning of image-text foundation models. Our method improves its two key components: minority samples identification and the robust training algorithm. Our theoretical analysis shows that our PPA enhances minority group identification and is Bayes optimal for minimizing the balanced group error.
arXiv Detail & Related papers (2025-03-12T15:46:12Z)
Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases. We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z)
Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects [24.712437703214547]
Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. It is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. We empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub.
arXiv Detail & Related papers (2024-02-26T13:08:44Z)
REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training [49.581884130880944]
Deep neural network (DNN) has been proven effective in various domains. However, they often struggle to perform well on certain minority groups during inference.
arXiv Detail & Related papers (2023-12-05T16:27:54Z)
Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications. Most testing procedures still rely on test cases written by humans to form test suites. We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z)
A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports [0.0]
Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task.
arXiv Detail & Related papers (2023-08-17T21:36:56Z)
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench [52.11481619456093]
We study the performance prediction problem on experiment records from BIG-bench. An $R2$ score greater than 95% indicates the presence of learnable patterns within the experiment records. We find a subset as informative as BIG-bench Hard for evaluating new model families, while being $3times$ smaller.
arXiv Detail & Related papers (2023-05-24T09:35:34Z)
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction [14.444294152595429]
The number of tests added in open source repositories due to issues was about 28% of the corresponding project test suite size. We propose LIBRO, a framework that uses Large Language Models (LLMs), which have been shown to be capable of performing code-related tasks. Our evaluation of LIBRO shows that, on the widely studied Defects4J benchmark, LIBRO can generate failure reproducing test cases for 33% of all studied cases.
arXiv Detail & Related papers (2022-09-23T10:50:47Z)
Assaying Out-Of-Distribution Generalization in Transfer Learning [103.57862972967273]
We take a unified view of previous work, highlighting message discrepancies that we address empirically. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting.
arXiv Detail & Related papers (2022-07-19T12:52:33Z)
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs) We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)
Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z)
Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis [71.40390724765903]
Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text. Existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects. We generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect's sentiment.
arXiv Detail & Related papers (2020-09-16T22:38:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.