State Field Coverage: A Metric for Oracle Quality
- URL: http://arxiv.org/abs/2510.03071v1
- Date: Fri, 03 Oct 2025 14:57:37 GMT
- Title: State Field Coverage: A Metric for Oracle Quality
- Authors: Facundo Molina, Nazareno Aguirre, Alessandra Gorla,
- Abstract summary: We introduce state field coverage, a novel metric for assessing oracle quality.<n>The main intuition of our metric is that oracles with a higher state field coverage are more likely to detect faults in the software under analysis.<n>Being statically computed, the metric is efficient and provides direct guidance for improving test oracles.
- Score: 45.805303746415944
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The effectiveness of testing in uncovering software defects depends not only on the characteristics of the test inputs and how thoroughly they exercise the software, but also on the quality of the oracles used to determine whether the software behaves as expected. Therefore, assessing the quality of oracles is crucial to improve the overall effectiveness of the testing process. Existing metrics have been used for this purpose, but they either fail to provide a comprehensive basis for guiding oracle improvement, or they are tailored to specific types of oracles, thus limiting their generality. In this paper, we introduce state field coverage, a novel metric for assessing oracle quality. This metric measures the proportion of an object's state, as statically defined by its class fields, that an oracle may access during test execution. The main intuition of our metric is that oracles with a higher state field coverage are more likely to detect faults in the software under analysis, as they inspect a larger portion of the object states to determine whether tests pass or not. We implement a mechanism to statically compute the state field coverage metric. Being statically computed, the metric is efficient and provides direct guidance for improving test oracles by identifying state fields that remain unexamined. We evaluate state field coverage through experiments involving 273 representation invariants and 249,027 test assertions. The results show that state field coverage is a well-suited metric for assessing oracle quality, as it strongly correlates with the oracles' fault-detection ability, measured by mutation score.
Related papers
- A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection [11.114573387946498]
Time series anomaly detection is widely used in IoT and cyber-physical systems.<n>This study introduces a problem-oriented framework that reinterprets existing metrics based on the specific evaluation challenges they are designed to address.
arXiv Detail & Related papers (2025-11-24T04:09:04Z) - LaajMeter: A Framework for LaaJ Evaluation [1.8583060903632522]
Large Language Models (LLMs) are increasingly used as evaluators in natural language processing tasks.<n>LaaJMeter is a simulation-based framework for controlled meta-evaluation of LaaJs.
arXiv Detail & Related papers (2025-08-13T19:51:05Z) - Optimizing Metamorphic Testing: Prioritizing Relations Through Execution Profile Dissimilarity [2.6749261270690434]
An oracle determines whether the output of a program for executed test cases is correct.
For machine learning programs, such an oracle is often unavailable or impractical to apply.
Prioritizing MRs enhances fault detection effectiveness and improves testing efficiency.
arXiv Detail & Related papers (2024-11-14T04:14:30Z) - Is Reference Necessary in the Evaluation of NLG Systems? When and Where? [58.52957222172377]
We show that reference-free metrics exhibit a higher correlation with human judgment and greater sensitivity to deficiencies in language quality.
Our study can provide insight into the appropriate application of automatic metrics and the impact of metric choice on evaluation performance.
arXiv Detail & Related papers (2024-03-21T10:31:11Z) - Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z) - Mind the Gap: The Difference Between Coverage and Mutation Score Can
Guide Testing Efforts [8.128730027609471]
An "adequate" test suite should effectively find all inconsistencies between a system's requirements/specifications and its implementation.
Practitioners frequently use code coverage to approximate adequacy, while academics argue that mutation score may better approximate true (oracular) adequacy coverage.
We propose a new framework for reasoning about the extent, limits, and nature of a given testing effort based on an idea we call the oracle gap.
arXiv Detail & Related papers (2023-09-05T17:05:52Z) - Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles.
The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z) - DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark.
We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions.
DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z) - Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Metrics reloaded: Recommendations for image analysis validation [59.60445111432934]
Metrics Reloaded is a comprehensive framework guiding researchers in the problem-aware selection of metrics.
The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint.
Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics.
arXiv Detail & Related papers (2022-06-03T15:56:51Z) - GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics.
Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation.
It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.