Which Combination of Test Metrics Can Predict Success of a Software Project? A Case Study in a Year-Long Project Course
- URL: http://arxiv.org/abs/2408.12120v1
- Date: Thu, 22 Aug 2024 04:23:51 GMT
- Title: Which Combination of Test Metrics Can Predict Success of a Software Project? A Case Study in a Year-Long Project Course
- Authors: Marina Filipovic, Fabian Gilson,
- Abstract summary: Testing plays an important role in securing the success of a software development project.
We investigate whether we can quantify the effects various types of testing have on functional suitability.
- Score: 1.553083901660282
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Testing plays an important role in securing the success of a software development project. Prior studies have demonstrated beneficial effects of applying acceptance testing within a Behavioural-Driven Development method. In this research, we investigate whether we can quantify the effects various types of testing have on functional suitability, i.e. the software conformance to users' functional expectations. We explore which combination of software testing (automated and manual, including acceptance testing) should be applied to ensure the expected functional requirements are met, as well as whether the lack of testing during a development iteration causes a significant increase of effort spent fixing the project later on. To answer those questions, we collected and analysed data from a year-long software engineering project course. We combined manual observations and statistical methods, namely Linear Mixed-Effects Modelling, to evaluate the effects of coverage metrics as well as time effort on passed stories over 5 Scrum sprints. The results suggest that a combination of a high code coverage for all of automated unit, acceptance, and manual testing has a significant impact on functional suitability. Similarly, but to a lower extent, front-end unit testing and manual testing can predict the success of a software delivery when taken independently. We observed a close-to-significant effect between low back-end testing and deferral (i.e. postponement) of user stories.
Related papers
- Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - A Comprehensive Study on Automated Testing with the Software Lifecycle [0.6144680854063939]
The research examines how automated testing makes it easier to evaluate software quality, how it saves time as compared to manual testing, and how it differs from each of them in terms of benefits and drawbacks.
The process of testing software applications is simplified, customized to certain testing situations, and can be successfully carried out by using automated testing tools.
arXiv Detail & Related papers (2024-05-02T06:30:37Z) - Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z) - Measuring Software Testability via Automatically Generated Test Cases [8.17364116624769]
We propose a new approach to pursuing testability measurements based on software metrics.
Our approach exploits automatic test generation and mutation analysis to quantify the evidence about the relative hardness of developing effective test cases.
arXiv Detail & Related papers (2023-07-30T09:48:51Z) - Towards Informed Design and Validation Assistance in Computer Games
Using Imitation Learning [65.12226891589592]
This paper proposes a new approach to automated game validation and testing.
Our method leverages a data-driven imitation learning technique, which requires little effort and time and no knowledge of machine learning or programming.
arXiv Detail & Related papers (2022-08-15T11:08:44Z) - Comparative Study of Machine Learning Test Case Prioritization for
Continuous Integration Testing [3.8073142980733]
We show that different machine learning models have different performance for different size of test history used for model training and for different time budget available for test case execution.
Our results imply that machine learning approaches for test prioritization in continuous integration testing should be carefully configured to achieve optimal performance.
arXiv Detail & Related papers (2022-04-22T19:20:49Z) - SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video
Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems.
We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub.
The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z) - Machine Learning Techniques for Software Quality Assurance: A Survey [5.33024001730262]
We discuss various approaches in both fault prediction and test case prioritization.
Recent studies deep learning algorithms for fault prediction help to bridge the gap between programs' semantics and fault prediction features.
arXiv Detail & Related papers (2021-04-29T00:37:27Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.