Related papers: DocTer: Documentation Guided Fuzzing for Testing Deep Learning API Functions

DocTer: Documentation Guided Fuzzing for Testing Deep Learning API Functions

URL: http://arxiv.org/abs/2109.01002v4
Date: Wed, 6 Mar 2024 01:51:29 GMT
Title: DocTer: Documentation Guided Fuzzing for Testing Deep Learning API Functions
Authors: Danning Xie, Yitong Li, Mijung Kim, Hung Viet Pham, Lin Tan, Xiangyu Zhang, Michael W. Godfrey
Abstract summary: We use DocTer to analyze API documentation to extract input constraints for API functions of deep learning (DL) libraries. DocTer features a novel algorithm that automatically constructs rules to extract API parameter constraints from syntactic patterns in the form of dependency parse trees of API descriptions. Our evaluation on three popular DL libraries shows that the precision of DocTer in extracting input constraints is 85.4%.
Score: 16.62942039883249
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Input constraints are useful for many software development tasks. For example, input constraints of a function enable the generation of valid inputs, i.e., inputs that follow these constraints, to test the function deeper. API functions of deep learning (DL) libraries have DL specific input constraints, which are described informally in the free form API documentation. Existing constraint extraction techniques are ineffective for extracting DL specific input constraints. To fill this gap, we design and implement a new technique, DocTer, to analyze API documentation to extract DL specific input constraints for DL API functions. DocTer features a novel algorithm that automatically constructs rules to extract API parameter constraints from syntactic patterns in the form of dependency parse trees of API descriptions. These rules are then applied to a large volume of API documents in popular DL libraries to extract their input parameter constraints. To demonstrate the effectiveness of the extracted constraints, DocTer uses the constraints to enable the automatic generation of valid and invalid inputs to test DL API functions. Our evaluation on three popular DL libraries (TensorFlow, PyTorch, and MXNet) shows that the precision of DocTer in extracting input constraints is 85.4%. DocTer detects 94 bugs from 174 API functions, including one previously unknown security vulnerability that is now documented in the CVE database, while a baseline technique without input constraints detects only 59 bugs. Most (63) of the 94 bugs are previously unknown, 54 of which have been fixed or confirmed by developers after we report them. In addition, DocTer detects 43 inconsistencies in documents, 39 of which are fixed or confirmed.

Related papers

Combining Static and Dynamic Approaches for Mining and Testing Constraints for RESTful API Testing [8.972346309150199]
We propose to combine a novel static analysis approach (in which the constraints for API response bodies are mined from API specifications) with the dynamic approach. We leverage large language models (LLMs) to comprehend the API specifications, mine constraints for response bodies, and generate test cases. We also use its generated test cases to detect 21 mismatches between the API specification and actual response data for 8 real-world APIs.
arXiv Detail & Related papers (2025-04-24T06:28:18Z)
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors. Traditional fuzzing struggles with the complexity and API diversity of DL libraries. We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z)
Subgraph-Oriented Testing for Deep Learning Libraries [9.78188667672054]
We propose SORT (Subgraph-Oriented Realistic Testing) to test Deep Learning (DL) libraries on different hardware platforms. SORT takes popular API interaction patterns, represented as frequent subgraphs of model graphs, as test subjects. SORT achieves a 100% valid input generation rate, detects more precision bugs than existing methods, and reveals interaction-related bugs missed by single-API testing.
arXiv Detail & Related papers (2024-12-09T12:10:48Z)
ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution. We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z)
Detecting Multi-Parameter Constraint Inconsistencies in Python Data Science Libraries [21.662640566736098]
We propose MPDetector, for detecting inconsistencies between code and documentation. MPDetector identifies these constraints at the code level by exploring execution paths through symbolic execution. We propose a customized fuzzy constraint logic to reconcile the unpredictability of LLM outputs.
arXiv Detail & Related papers (2024-11-18T09:30:14Z)
DeepREST: Automated Test Case Generation for REST APIs Exploiting Deep Reinforcement Learning [5.756036843502232]
This paper introduces DeepREST, a novel black-box approach for automatically testing REST APIs. It leverages deep reinforcement learning to uncover implicit API constraints, that is, constraints hidden from API documentation. Our empirical validation suggests that the proposed approach is very effective in achieving high test coverage and fault detection.
arXiv Detail & Related papers (2024-08-16T08:03:55Z)
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability. Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z)
KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs. Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
arXiv Detail & Related papers (2024-07-14T14:48:18Z)
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment [49.00213183302225]
We propose a framework to induce new APIs by grounding wikiHow instruction to situated agent policies. Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4.
arXiv Detail & Related papers (2024-07-10T15:52:44Z)
DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis [8.779035160734523]
Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. This paper introduces thatens, a novel differential testing technique for DL library testing.
arXiv Detail & Related papers (2024-06-12T07:06:38Z)
ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases [53.29427395419317]
Reasoning over Commonsense Knowledge Bases (CSKB) has been explored as a way to acquire new commonsense knowledge. We propose **ConstraintChecker**, a plugin over prompting techniques to provide and check explicit constraints.
arXiv Detail & Related papers (2024-01-25T08:03:38Z)
Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification. Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z)
ACETest: Automated Constraint Extraction for Testing Deep Learning Operators [23.129431525952263]
It is essential that the test cases pass the input validity check and are able to reach the core function logic of the operators. Existing techniques rely on either human effort or documentation of DL library APIs to extract the constraints. We propose ACETest, a technique to automatically extract input validation constraints from the code to build valid yet diverse test cases.
arXiv Detail & Related papers (2023-05-29T06:49:40Z)
Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program. In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language. In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z)
DocCoder: Generating Code by Retrieving and Reading Docs [87.88474546826913]
We introduce DocCoder, an approach that explicitly leverages code manuals and documentation. Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model.
arXiv Detail & Related papers (2022-07-13T06:47:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.