Related papers: A Feature-Based Approach to Generating Comprehensive End-to-End Tests

Related papers

E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task [40.46045741731215]
We present E2EDev, a novel benchmark grounded in the principles of Behavior-Driven Development (BDD)<n>E2EDev comprises (i) a fine-grained set of user requirements, (ii) multiple BDD test scenarios with corresponding Python step implementations for each requirement, and (iii) a fully automated testing pipeline built on the Behave framework.
arXiv Detail & Related papers (2025-10-16T09:54:26Z)
GenIA-E2ETest: A Generative AI-Based Approach for End-to-End Test Automation [0.3499870393443268]
This paper introduces GenIA-E2ETest, which leverages generative AI to generate E2E test scripts from natural language descriptions automatically.<n>We evaluated the approach on two web applications, assessing completeness, correctness, adaptation effort, and robustness.
arXiv Detail & Related papers (2025-10-01T15:30:24Z)
Automatic Proficiency Assessment in L2 English Learners [51.652753736780205]
Second language proficiency (L2) in English is usually perceptually evaluated by English teachers or expert evaluators.<n>This paper explores deep learning techniques for comprehensive L2 proficiency assessment, addressing both the speech signal and its correspondent transcription.
arXiv Detail & Related papers (2025-05-05T12:36:03Z)
Acceptance Test Generation with Large Language Models: An Industrial Case Study [0.7874708385247353]
Large language model (LLM)-powered assistants are increasingly used for generating program code and unit tests. This paper explores the use of LLMs for generating executable acceptance tests for web applications through a two-step process. This two-step approach supports acceptance test-driven development, enhances tester control, and improves test quality.
arXiv Detail & Related papers (2025-04-09T19:33:38Z)
Automatic High-Level Test Case Generation using Large Language Models [1.8136446064778242]
Primary challenge is not writing test scripts but aligning testing efforts with business requirements. We constructed a use-case dataset to train/fine-tune models for generating high-level test cases. Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation.
arXiv Detail & Related papers (2025-03-23T09:14:41Z)
A Study on the Improvement of Code Generation Quality Using Large Language Models Leveraging Product Documentation [0.0]
This study proposes a method for automatically generating E2E test code from product documentation. Tests generated from product documentation had high compilation success and functional coverage, outperforming those based on requirement specs and user stories.
arXiv Detail & Related papers (2025-03-22T18:42:05Z)
The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents. We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature. We conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks.
arXiv Detail & Related papers (2024-12-06T23:43:59Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
AI-powered test automation tools: A systematic review and empirical evaluation [1.3490988186255937]
We investigate the features provided by existing AI-based test automation tools. We empirically evaluate how the AI features can be helpful for effectiveness and efficiency of testing. We also study the limitations of the AI features in AI-based test tools.
arXiv Detail & Related papers (2024-08-31T10:10:45Z)
A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites [1.4563527353943984]
Large Language Models (LLMs) have been applied to various aspects of software development. We present AgoneTest: an automated system for generating test suites for Java projects.
arXiv Detail & Related papers (2024-08-14T23:02:16Z)
Selene: Pioneering Automated Proof in Software Verification [62.09555413263788]
We introduce Selene, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation.
arXiv Detail & Related papers (2024-01-15T13:08:38Z)
End-to-End Test Coverage Metrics in Microservice Systems: An Automated Approach [2.6245844272542027]
This paper introduces test coverage metrics for evaluating the extent of E2E test suite coverage for microservice endpoints. Next, it presents an automated approach to compute these metrics to provide feedback on the completeness of E2E test suites.
arXiv Detail & Related papers (2023-08-18T02:30:19Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence. We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers. Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z)
E-Valuating Classifier Two-Sample Tests [11.248868528186332]
Our test combines ideas from existing work-valid split likelihood ratio tests and predictive independence tests. The resulting E-values are suitable for anytime sequential two-sample tests.
arXiv Detail & Related papers (2022-10-24T08:18:36Z)
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples. Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z)
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI [67.13999010060057]
We propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements.
arXiv Detail & Related papers (2021-12-05T07:30:17Z)
Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance. Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z)
On Introducing Automatic Test Case Generation in Practice: A Success Story and Lessons Learned [7.717446055777458]
This paper reports our experience in introducing techniques for automatically generating system test suites in a medium-size company. We describe the technical and organisational obstacles that we faced when introducing automatic test case generation. We present ABT2.0, the test case generator that we developed.
arXiv Detail & Related papers (2021-02-28T11:31:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.