Related papers: Enhancing NeuroEvolution-Based Game Testing: A Branch Coverage Approach for Scratch Programs

Enhancing NeuroEvolution-Based Game Testing: A Branch Coverage Approach for Scratch Programs

URL: http://arxiv.org/abs/2507.09414v1
Date: Sat, 12 Jul 2025 22:36:47 GMT
Title: Enhancing NeuroEvolution-Based Game Testing: A Branch Coverage Approach for Scratch Programs
Authors: Khizra Sohail, Atif Aftab Ahmed Jilani, Nigar Azhar Butt,
Abstract summary: This paper introduces a branch coverage-based fitness function to enhance test effectiveness in automated game testing.<n>We extend NEATEST by integrating a branch fitness function that prioritizes control-dependent branches, guiding the neuroevolution process to maximize branch exploration.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated test generation for game-like programs presents unique challenges due to their non-deterministic behavior and complex control structures. The NEATEST framework has been used for automated testing in Scratch games, employing neuroevolution-based test generation optimized for statement coverage. However, statement coverage alone is often insufficient for fault detection, as it does not guarantee execution of all logical branches. This paper introduces a branch coverage-based fitness function to enhance test effectiveness in automated game testing. We extend NEATEST by integrating a branch fitness function that prioritizes control-dependent branches, guiding the neuroevolution process to maximize branch exploration. To evaluate the effectiveness of this approach, empirical experiments were conducted on 25 Scratch games, comparing Neatest with Statement Coverage (NSC) against Neatest with Branch Coverage (NBC). A mutation analysis was also performed to assess the fault detection capabilities of both techniques. The results demonstrate that NBC achieves higher branch coverage than NSC in 13 out of 25 games, particularly in programs with complex conditional structures. Moreover, NBC achieves a lower false positive rate in mutation testing, making it a more reliable approach for identifying faulty behavior in game programs. These findings confirm that branch coverage-based test generation improves test coverage and fault detection in Scratch programs.

Related papers

Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
This study is the first empirical study on early test termination due to assertion failure.<n>We investigated 207 versions of 6 open-source projects.<n>Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
arXiv Detail & Related papers (2025-04-06T17:14:09Z)
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations [53.0667196725616]
Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment.<n>DRL has recently gained traction from being able to solve complex environments like driving simulators, 3D robotic control, and multiplayer-online-battle-arena video games.<n>Numerous implementations of the state-of-the-art algorithms responsible for training these agents, like the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms, currently exist.
arXiv Detail & Related papers (2025-03-28T16:25:06Z)
Many-Objective Neuroevolution for Testing Games [8.422309223970302]
Test generator NEATEST tackles challenges by combining search-based software testing principles with neuroevolution.<n>We transform NEATEST into a many-objective search algorithm that targets several program states simultaneously.<n>Our experiments show that extending NEATEST to target several objectives simultaneously increases the average branch coverage from 75.88% to 81.33%.
arXiv Detail & Related papers (2025-01-14T09:18:34Z)
Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z)
Contextual Predictive Mutation Testing [17.832774161583036]
We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method. Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants. We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
arXiv Detail & Related papers (2023-09-05T17:00:15Z)
Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing [13.743062498008555]
We introduce MuTAP for improving the effectiveness of test cases generated by Large Language Models (LLMs) in terms of revealing bugs. MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs) Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets.
arXiv Detail & Related papers (2023-08-31T08:48:31Z)
Fuzzing for CPS Mutation Testing [3.512722797771289]
We propose a mutation testing approach that leverages fuzz testing, which has proved effective with C and C++ software. Our empirical evaluation shows that mutation testing based on fuzz testing kills a significantly higher proportion of live mutants than symbolic execution.
arXiv Detail & Related papers (2023-08-15T16:35:31Z)
Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure [0.0]
This paper presents a highly-automated framework for novel test selection based on neural networks. Three configurations of this framework are tested with a commercial signal processing unit. All three convincingly outperform random test selection with the largest saving of simulation being 49.37% to reach 99.5% coverage.
arXiv Detail & Related papers (2022-07-01T14:11:08Z)
Is Neuron Coverage Needed to Make Person Detection More Robust? [3.395452700023097]
In this work, we apply coverage-guided testing (CGT) to the task of person detection in crowded scenes. The proposed pipeline uses YOLOv3 for person detection and includes finding bugs via sampling and mutation. We have found no evidence that the investigated coverage metrics can be advantageously used to improve robustness.
arXiv Detail & Related papers (2022-04-21T11:23:33Z)
SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)
Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts. Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures. We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.