Related papers: Disrupting Test Development with AI Assistants

Disrupting Test Development with AI Assistants

URL: http://arxiv.org/abs/2411.02328v1
Date: Mon, 04 Nov 2024 17:52:40 GMT
Title: Disrupting Test Development with AI Assistants
Authors: Vijay Joshi, Iver Band,
Abstract summary: Generative AI-assisted coding tools like GitHub Copilot, ChatGPT, and Tabnine have significantly transformed software development. This paper analyzes how these innovations impact productivity and software test development metrics.
Score: 1.024113475677323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in large language models, including GPT-4 and its variants, and Generative AI-assisted coding tools like GitHub Copilot, ChatGPT, and Tabnine, have significantly transformed software development. This paper analyzes how these innovations impact productivity and software test development metrics. These tools enable developers to generate complete software programs with minimal human intervention before deployment. However, thorough review and testing by developers are still crucial. Utilizing the Test Pyramid concept, which categorizes tests into unit, integration, and end-to-end tests, we evaluate three popular AI coding assistants by generating and comparing unit tests for opensource modules. Our findings show that AI-generated tests are of equivalent quality to original tests, highlighting differences in usage and results among the tools. This research enhances the understanding and capabilities of AI-assistant tools in automated testing.

Related papers

Generative Artificial Intelligence-Supported Pentesting: A Comparison between Claude Opus, GPT-4, and Copilot [44.99833362998488]
GenAI can be applied across numerous fields, with particular relevance in cybersecurity. In this paper, we have analyzed the potential of leading generic-purpose GenAI tools. Claude Opus, GPT-4 from ChatGPT, and Copilot-in augmenting the penetration testing process as defined by the Penetration Testing Execution Standard (PTES)
arXiv Detail & Related papers (2025-01-12T22:48:37Z)
Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead [43.15092098658384]
Exploratory testing (ET) harnesses tester's knowledge, creativity, and experience to create varying tests that uncover unexpected bugs from the end-user's perspective. We explore the feasibility, challenges and road ahead of automated scenario-based ET (a.k.a soap opera testing)
arXiv Detail & Related papers (2024-12-11T17:57:23Z)
AutoPenBench: Benchmarking Generative Agents for Penetration Testing [42.681170697805726]
This paper introduces AutoPenBench, an open benchmark for evaluating generative agents in automated penetration testing. We present a comprehensive framework that includes 33 tasks, each representing a vulnerable system that the agent has to attack. We show the benefits of AutoPenBench by testing two agent architectures: a fully autonomous and a semi-autonomous supporting human interaction.
arXiv Detail & Related papers (2024-10-04T08:24:15Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Multi-language Unit Test Generation using LLMs [6.259245181881262]
We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases. We show how the pipeline can be applied to different programming languages, specifically Java and Python, and to complex software requiring environment mocking. Our results demonstrate that LLM-based test generation, when guided by static analysis, can be competitive with, and even outperform, state-of-the-art test-generation techniques in coverage achieved.
arXiv Detail & Related papers (2024-09-04T21:46:18Z)
The Role of Artificial Intelligence and Machine Learning in Software Testing [0.14896196009851972]
Artificial Intelligence (AI) and Machine Learning (ML) have significantly impacted various industries. Software testing, a crucial part of the software development lifecycle (SDLC), ensures the quality and reliability of software products. This paper explores the role of AI and ML in software testing by reviewing existing literature, analyzing current tools and techniques, and presenting case studies.
arXiv Detail & Related papers (2024-09-04T13:25:13Z)
AI-powered test automation tools: A systematic review and empirical evaluation [1.3490988186255937]
We investigate the features provided by existing AI-based test automation tools. We empirically evaluate how the AI features can be helpful for effectiveness and efficiency of testing. We also study the limitations of the AI features in AI-based test tools.
arXiv Detail & Related papers (2024-08-31T10:10:45Z)
Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases. We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z)
A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites [1.4563527353943984]
Large Language Models (LLMs) have been applied to various aspects of software development. We present AgoneTest: an automated system for generating test suites for Java projects.
arXiv Detail & Related papers (2024-08-14T23:02:16Z)
A Multi-Year Grey Literature Review on AI-assisted Test Automation [46.97326049485643]
Test Automation (TA) techniques are crucial for quality assurance in software engineering. TA techniques face limitations such as high test suite maintenance costs and the need for extensive programming skills. Artificial Intelligence (AI) offers new opportunities to address these issues through automation and improved practices.
arXiv Detail & Related papers (2024-08-12T15:26:36Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles. The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Integrated Benchmarking and Design for Reproducible and Accessible Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking. One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible. We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.