Related papers: Automatic Unit Test Generation for Deep Learning Frameworks based on API Knowledge

Automatic Unit Test Generation for Deep Learning Frameworks based on API Knowledge

URL: http://arxiv.org/abs/2307.00404v1
Date: Sat, 1 Jul 2023 18:34:56 GMT
Title: Automatic Unit Test Generation for Deep Learning Frameworks based on API Knowledge
Authors: Arunkaleeshwaran Narayanan, Nima Shiri harzevili, Junjie Wang, Lin Shi, Moshi Wei, Song Wang
Abstract summary: We propose MUTester to generate unit test cases for APIs of deep learning frameworks. We first propose a set of 18 rules for mining API constraints from the API documents. We then use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments.
Score: 11.523398693942413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many automatic unit test generation tools that can generate unit test cases with high coverage over a program have been proposed. However, most of these tools are ineffective on deep learning (DL) frameworks due to the fact that many of deep learning APIs expect inputs that follow specific API knowledge. To fill this gap, we propose MUTester to generate unit test cases for APIs of deep learning frameworks by leveraging the API constraints mined from the corresponding API documentation and the API usage patterns mined from code fragments in Stack Overflow (SO). Particularly, we first propose a set of 18 rules for mining API constraints from the API documents. We then use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments collected from SO. Finally, we use the above two types of API knowledge to guide the test generation of existing test generators for deep learning frameworks. To evaluate the performance of MUTester, we first collect 1,971 APIs from four widely-used deep learning frameworks (i.e., Scikit-learn, PyTorch, TensorFlow, and CNTK) and for each API, we further extract its API knowledge, i.e., API constraints and API usage. Given an API, MUTester combines its API knowledge with existing test generators (e.g., search-based test generator PyEvosuite and random test generator PyRandoop) to generate test cases to test the API. Results of our experiment show that MUTester can significantly improve the corresponding test generation methods and the improvement in code coverage is 15.7% to 27.0% on average. In addition, it can help reduce around 19.0% of invalid tests generated by the existing test generators. Our user study with 16 developers further demonstrates the practicality of MUTester in generating test cases for deep learning frameworks.

Related papers

Utilizing API Response for Test Refinement [2.8002188463519944]
This paper proposes a dynamic test refinement approach that leverages the response message. Using an intelligent agent, the approach adds constraints to the API specification that are further used to generate a test scenario. The proposed approach led to a decrease in the number of 4xx responses, taking a step closer to generating more realistic test cases.
arXiv Detail & Related papers (2025-01-30T05:26:32Z)
LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
We present LlamaRestTest, a novel approach that employs two custom Large Language Models (LLMs) to generate realistic test inputs. We evaluate it against several state-of-the-art REST API testing tools, including RESTGPT, a GPT-powered specification-enhancement tool. Our study shows that small language models can perform as well as, or better than, large language models in REST API testing.
arXiv Detail & Related papers (2025-01-15T05:51:20Z)
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors. Traditional fuzzing struggles with the complexity and API diversity of DL libraries. We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z)
ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution. We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z)
A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs [46.65963514391019]
We present AutoRestTest, the first black-box framework to adopt a dependency-embedded multi-agent approach for REST API testing. We integrate Multi-Agent Reinforcement Learning (MARL) with a Semantic Property Dependency Graph (SPDG) and Large Language Models (LLMs) Our approach treats REST API testing as a separable problem, where four agents -- API, dependency, parameter, and value -- collaborate to optimize API exploration.
arXiv Detail & Related papers (2024-11-11T16:20:27Z)
Model Equality Testing: Which Model Is This API Serving? [59.005869726179455]
We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem. A test built on a simple string kernel achieves a median of 77.4% power against a range of distortions. We then apply this test to commercial inference APIs for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.
arXiv Detail & Related papers (2024-10-26T18:34:53Z)
A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z)
Retrieval-Augmented Test Generation: How Far Are We? [8.84734567720785]
Retrieval Augmented Generation (RAG) has shown notable advancements in software engineering tasks. To bridge this gap, we take the initiative to investigate the efficacy of RAG-based LLMs in test generation. Specifically, we examine RAG built upon three types of domain knowledge: 1) API documentation, 2) GitHub issues, and 3) StackOverflow Q&As.
arXiv Detail & Related papers (2024-09-19T11:48:29Z)
APITestGenie: Automated API Test Generation through Generative AI [2.0716352593701277]
APITestGenie generates executable API test scripts from business requirements and API specifications. In experiments with 10 real-world APIs, the tool generated valid test scripts 57% of the time. Human intervention is recommended to validate or refine generated scripts before integration into CI/CD pipelines.
arXiv Detail & Related papers (2024-09-05T18:02:41Z)
DeepREST: Automated Test Case Generation for REST APIs Exploiting Deep Reinforcement Learning [5.756036843502232]
This paper introduces DeepREST, a novel black-box approach for automatically testing REST APIs. It leverages deep reinforcement learning to uncover implicit API constraints, that is, constraints hidden from API documentation. Our empirical validation suggests that the proposed approach is very effective in achieving high test coverage and fault detection.
arXiv Detail & Related papers (2024-08-16T08:03:55Z)
KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs. Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
arXiv Detail & Related papers (2024-07-14T14:48:18Z)
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment [49.00213183302225]
We propose a framework to induce new APIs by grounding wikiHow instruction to situated agent policies. Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4.
arXiv Detail & Related papers (2024-07-10T15:52:44Z)
Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification. Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z)
Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries. We propose a novel framework that emulates the process of programmers writing private code. We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z)
Carving UI Tests to Generate API Tests and API Specification [8.743426215048451]
API-level testing can play an important role, in-between unit-level testing and UI-level (or end-to-end) testing. Existing API testing tools require API specifications, which often may not be available or, when available, be inconsistent with the API implementation. We present an approach that leverages UI testing to enable API-level testing for web applications.
arXiv Detail & Related papers (2023-05-24T03:53:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.