End-to-End Test Coverage Metrics in Microservice Systems: An Automated
Approach
- URL: http://arxiv.org/abs/2308.09257v1
- Date: Fri, 18 Aug 2023 02:30:19 GMT
- Title: End-to-End Test Coverage Metrics in Microservice Systems: An Automated
Approach
- Authors: Amr Elsayed, Tomas Cerny, Jorge Yero Salazar, Austin Lehman, Joshua
Hunter, Ashley Bickham and Davide Taibi
- Abstract summary: This paper introduces test coverage metrics for evaluating the extent of E2E test suite coverage for microservice endpoints.
Next, it presents an automated approach to compute these metrics to provide feedback on the completeness of E2E test suites.
- Score: 2.6245844272542027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Microservice architecture gains momentum by fueling systems with cloud-native
benefits, scalability, and decentralized evolution. However, new challenges
emerge for end-to-end (E2E) testing. Testers who see the decentralized system
through the user interface might assume their tests are comprehensive, covering
all middleware endpoints scattered across microservices. However, they do not
have instruments to verify such assumptions. This paper introduces test
coverage metrics for evaluating the extent of E2E test suite coverage for
microservice endpoints. Next, it presents an automated approach to compute
these metrics to provide feedback on the completeness of E2E test suites.
Furthermore, a visual perspective is provided to highlight test coverage across
the system's microservices to guide on gaps in test suites. We implement a
proof-of-concept tool and perform a case study on a well-established system
benchmark showing it can generate conclusive feedback on test suite coverage
over system endpoints.
Related papers
- AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites [1.4563527353943984]
Large Language Models (LLMs) have been applied to various aspects of software development.
We present AgoneTest: an automated system for generating test suites for Java projects.
arXiv Detail & Related papers (2024-08-14T23:02:16Z) - A Feature-Based Approach to Generating Comprehensive End-to-End Tests [5.7340627516257525]
AUTOE2E is a novel approach to automate the generation of semantically meaningful feature-driven E2E test cases for web applications.
We introduce E2EBENCH, a new benchmark for automatically assessing the feature coverage of E2E test suites.
arXiv Detail & Related papers (2024-08-04T01:16:04Z) - Selene: Pioneering Automated Proof in Software Verification [62.09555413263788]
We introduce Selene, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4.
Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation.
arXiv Detail & Related papers (2024-01-15T13:08:38Z) - Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z) - DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark.
We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions.
DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z) - Generalizable Metric Network for Cross-domain Person Re-identification [55.71632958027289]
Cross-domain (i.e., domain generalization) scene presents a challenge in Re-ID tasks.
Most existing methods aim to learn domain-invariant or robust features for all domains.
We propose a Generalizable Metric Network (GMN) to explore sample similarity in the sample-pair space.
arXiv Detail & Related papers (2023-06-21T03:05:25Z) - Benchmarks for End-to-End Microservices Testing [2.6245844272542027]
We created a test benchmark containing full functional testing coverage for two well-established open-source microservice systems.
We also conducted a case study to identify the best approaches to take to validate a full coverage of tests.
arXiv Detail & Related papers (2023-06-09T13:42:53Z) - Towards Interpretable and Efficient Automatic Reference-Based
Summarization Evaluation [160.07938471250048]
Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics.
We develop strong-performing automatic metrics for reference-based summarization evaluation.
arXiv Detail & Related papers (2023-03-07T02:49:50Z) - Overview of Test Coverage Criteria for Test Case Generation from Finite
State Machines Modelled as Directed Graphs [0.12891210250935145]
Test Coverage criteria are an essential concept for test engineers when generating the test cases from a System Under Test model.
Test Coverage criteria define the number of actions or combinations by which a system is tested.
This study summarized all commonly used test coverage criteria for Finite State Machines and discussed them regarding their subsumption, equivalence, or non-comparability.
arXiv Detail & Related papers (2022-03-17T20:30:14Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.