Related papers: Size biased Multinomial Modelling of detection data in Software testing

Size biased Multinomial Modelling of detection data in Software testing

URL: http://arxiv.org/abs/2406.04360v1
Date: Fri, 24 May 2024 17:57:34 GMT
Title: Size biased Multinomial Modelling of detection data in Software testing
Authors: Pallabi Ghosh, Ashis Kr. Chakraborty, Soumen Dey,
Abstract summary: We make use of the bug size or the eventual bug size which helps us to determine reliability of software more precisely. The model has been validated through simulation and subsequently used for a critical space application software testing data.
Score: 1.7532822703595772
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimation of software reliability often poses a considerable challenge, particularly for critical softwares. Several methods of estimation of reliability of software are already available in the literature. But, so far almost nobody used the concept of size of a bug for estimating software reliability. In this article we make used of the bug size or the eventual bug size which helps us to determine reliability of software more precisely. The size-biased model developed here can also be used for similar fields like hydrocarbon exploration. The model has been validated through simulation and subsequently used for a critical space application software testing data. The estimated results match the actual observations to a large extent.

Related papers

Do Large Language Model Benchmarks Test Reliability? [66.1783478365998]
We investigate how well current benchmarks quantify model reliability. Motivated by this gap in the evaluation of reliability, we propose the concept of so-called platinum benchmarks. We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks.
arXiv Detail & Related papers (2025-02-05T18:58:19Z)
Are Large Language Models Memorizing Bug Benchmarks? [6.640077652362016]
Large Language Models (LLMs) have become integral to various software engineering tasks, including code generation, bug detection, and repair. A growing concern within the software engineering community is that benchmarks may not reliably reflect true LLM performance due to the risk of data leakage. We systematically evaluate popular LLMs to assess their susceptibility to data leakage from widely used bug benchmarks.
arXiv Detail & Related papers (2024-11-20T13:46:04Z)
Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z)
Demonstration of a Response Time Based Remaining Useful Life (RUL) Prediction for Software Systems [0.966840768820136]
Prognostic and Health Management (PHM) has been widely applied to hardware systems in the electronics and non-electronics domains. This paper addresses the application of PHM concepts to software systems for fault predictions and RUL estimation.
arXiv Detail & Related papers (2023-07-23T06:06:38Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Applying Machine Learning Analysis for Software Quality Test [0.0]
It is critical to comprehend what triggers maintenance and if it may be predicted. Numerous methods of assessing the complexity of created programs may produce useful prediction models. In this paper, the machine learning is applied on the available data to calculate the cumulative software failure levels.
arXiv Detail & Related papers (2023-05-16T06:10:54Z)
Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework. We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models. We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users. We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
Learning Accurate Dense Correspondences and When to Trust Them [161.76275845530964]
We aim to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map. We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty. Our approach obtains state-of-the-art results on challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-01-05T18:54:11Z)
Software Effort Estimation using parameter tuned Models [1.9336815376402716]
The imprecision of the estimation is the reason for Project Failure. The greatest pitfall of the software industry was the fast-changing nature of software development. We need the development of useful models that accurately predict the cost of developing a software product.
arXiv Detail & Related papers (2020-08-25T15:18:59Z)
Software Defect Prediction Based On Deep Learning Models: Performance Study [0.5735035463793008]
Two deep learning models, Stack Sparse Auto-Encoder (SSAE) and Deep Belief Network (DBN) are deployed to classify NASA datasets. According to the conducted experiment, the accuracy for the datasets with sufficient samples is enhanced.
arXiv Detail & Related papers (2020-04-02T06:02:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.