SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video
Games Using Risk Based Testing and Machine Learning
- URL: http://arxiv.org/abs/2203.05566v2
- Date: Wed, 28 Jun 2023 16:35:23 GMT
- Title: SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video
Games Using Risk Based Testing and Machine Learning
- Authors: Alexander Senchenko, Naomi Patterson, Hamman Samuel, Dan Isper
- Abstract summary: Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems.
We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub.
The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
- Score: 62.997667081978825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Testing video games is an increasingly difficult task as traditional methods
fail to scale with growing software systems. Manual testing is a very
labor-intensive process, and therefore quickly becomes cost prohibitive. Using
scripts for automated testing is affordable, however scripts are ineffective in
non-deterministic environments, and knowing when to run each test is another
problem altogether. The modern game's complexity, scope, and player
expectations are rapidly increasing where quality control is a big portion of
the production cost and delivery risk. Reducing this risk and making production
happen is a big challenge for the industry currently. To keep production costs
realistic up-to and after release, we are focusing on preventive quality
assurance tactics alongside testing and data analysis automation. We present
SUPERNOVA (Selection of tests and Universal defect Prevention in External
Repositories for Novel Objective Verification of software Anomalies), a system
responsible for test selection and defect prevention while also functioning as
an automation hub. By integrating data analysis functionality with machine and
deep learning capability, SUPERNOVA assists quality assurance testers in
finding bugs and developers in reducing defects, which improves stability
during the production cycle and keeps testing costs under control. The direct
impact of this has been observed to be a reduction in 55% or more testing hours
for an undisclosed sports game title that has shipped, which was using these
test selection optimizations. Furthermore, using risk scores generated by a
semi-supervised machine learning model, we are able to detect with 71%
precision and 77% recall the probability of a change-list being bug inducing,
and provide a detailed breakdown of this inference to developers. These efforts
improve workflow and reduce testing hours required on game titles in
development.
Related papers
- AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - The Future of Software Testing: AI-Powered Test Case Generation and Validation [0.0]
This paper explores the transformative potential of AI in improving test case generation and validation.
It focuses on its ability to enhance efficiency, accuracy, and scalability in testing processes.
It also addresses key challenges associated with adapting AI for testing, including the need for high quality training data.
arXiv Detail & Related papers (2024-09-09T17:12:40Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process.
We present TaRGet, a novel approach leveraging pre-trained code language models for automated test case repair.
TaRGet treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z) - Identifying the Risks of LM Agents with an LM-Emulated Sandbox [68.26587052548287]
Language Model (LM) agents and tools enable a rich set of capabilities but also amplify potential risks.
High cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks.
We introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios.
arXiv Detail & Related papers (2023-09-25T17:08:02Z) - Technical Challenges of Deploying Reinforcement Learning Agents for Game
Testing in AAA Games [58.720142291102135]
We describe an effort to add an experimental reinforcement learning system to an existing automated game testing solution based on scripted bots.
We show a use-case of leveraging reinforcement learning in game production and cover some of the largest time sinks anyone who wants to make the same journey for their game may encounter.
We propose a few research directions that we believe will be valuable and necessary for making machine learning, and especially reinforcement learning, an effective tool in game production.
arXiv Detail & Related papers (2023-07-19T18:19:23Z) - Distribution Awareness for AI System Testing [0.0]
We propose a new OOD-guided testing technique which aims to generate new unseen test cases relevant to the underlying DL system task.
Our results show that this technique is able to filter up to 55.44% of error test case on CIFAR-10 and is 10.05% more effective in enhancing robustness.
arXiv Detail & Related papers (2021-05-06T09:24:06Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Reinforcement Learning for Test Case Prioritization [0.24366811507669126]
This paper extends recent studies on applying Reinforcement Learning to optimize testing strategies.
We test its ability to adapt to new environments, by testing it on novel data extracted from a financial institution.
We also studied the impact of using Decision Tree (DT) Approximator as a model for memory representation.
arXiv Detail & Related papers (2020-12-18T11:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.