ACETest: Automated Constraint Extraction for Testing Deep Learning
Operators
- URL: http://arxiv.org/abs/2305.17914v2
- Date: Sun, 4 Jun 2023 04:01:26 GMT
- Title: ACETest: Automated Constraint Extraction for Testing Deep Learning
Operators
- Authors: Jingyi Shi, Yang Xiao, Yuekang Li, Yeting Li, Dongsong Yu, Chendong
Yu, Hui Su, Yufeng Chen, Wei Huo
- Abstract summary: It is essential that the test cases pass the input validity check and are able to reach the core function logic of the operators.
Existing techniques rely on either human effort or documentation of DL library APIs to extract the constraints.
We propose ACETest, a technique to automatically extract input validation constraints from the code to build valid yet diverse test cases.
- Score: 23.129431525952263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning (DL) applications are prevalent nowadays as they can help with
multiple tasks. DL libraries are essential for building DL applications.
Furthermore, DL operators are the important building blocks of the DL
libraries, that compute the multi-dimensional data (tensors). Therefore, bugs
in DL operators can have great impacts. Testing is a practical approach for
detecting bugs in DL operators. In order to test DL operators effectively, it
is essential that the test cases pass the input validity check and are able to
reach the core function logic of the operators. Hence, extracting the input
validation constraints is required for generating high-quality test cases.
Existing techniques rely on either human effort or documentation of DL library
APIs to extract the constraints. They cannot extract complex constraints and
the extracted constraints may differ from the actual code implementation.
To address the challenge, we propose ACETest, a technique to automatically
extract input validation constraints from the code to build valid yet diverse
test cases which can effectively unveil bugs in the core function logic of DL
operators. For this purpose, ACETest can automatically identify the input
validation code in DL operators, extract the related constraints and generate
test cases according to the constraints. The experimental results on popular DL
libraries, TensorFlow and PyTorch, demonstrate that ACETest can extract
constraints with higher quality than state-of-the-art (SOTA) techniques.
Moreover, ACETest is capable of extracting 96.4% more constraints and detecting
1.95 to 55 times more bugs than SOTA techniques. In total, we have used ACETest
to detect 108 previously unknown bugs on TensorFlow and PyTorch, with 87 of
them confirmed by the developers. Lastly, five of the bugs were assigned with
CVE IDs due to their security impacts.
Related papers
- AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL [46.65963514391019]
AutoRestTest is a novel tool for testing REST APIs.
It integrates the Semantic Operation Dependency Graph (SODG) with Multi-Agent Reinforcement Learning (MARL) and large language models (LLMs)
It provides continuous telemetry on successful operation count, unique server errors detected, and time elapsed.
arXiv Detail & Related papers (2025-01-15T05:54:33Z) - LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
We present LlamaRestTest, a novel approach that employs two custom LLMs to generate realistic test inputs.
LlamaRestTest surpasses state-of-the-art tools in code coverage and error detection, even with RESTGPT-enhanced specifications.
arXiv Detail & Related papers (2025-01-15T05:51:20Z) - Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.
Traditional fuzzing struggles with the complexity and API diversity of DL libraries.
We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z) - Subgraph-Oriented Testing for Deep Learning Libraries [9.78188667672054]
We propose SORT (Subgraph-Oriented Realistic Testing) to test Deep Learning (DL) libraries on different hardware platforms.
SORT takes popular API interaction patterns, represented as frequent subgraphs of model graphs, as test subjects.
SORT achieves a 100% valid input generation rate, detects more precision bugs than existing methods, and reveals interaction-related bugs missed by single-API testing.
arXiv Detail & Related papers (2024-12-09T12:10:48Z) - A Tale of Two DL Cities: When Library Tests Meet Compiler [12.751626834965231]
We propose OPERA to extract domain knowledge from the test inputs for DL libraries.
OPERA constructs diverse tests from the various test inputs for DL libraries.
It incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs.
arXiv Detail & Related papers (2024-07-23T16:35:45Z) - DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis [8.779035160734523]
Testing is a major approach to ensuring the quality of deep learning (DL) libraries.
Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction.
This paper introduces thatens, a novel differential testing technique for DL library testing.
arXiv Detail & Related papers (2024-06-12T07:06:38Z) - MoCo: Fuzzing Deep Learning Libraries via Assembling Code [13.937180393991616]
Deep learning techniques have been applied in software systems with various application scenarios.
DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts.
We propose MoCo, a novel fuzzing testing method for DL libraries via assembling code.
arXiv Detail & Related papers (2024-05-13T13:40:55Z) - GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability.
We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z) - DocTer: Documentation Guided Fuzzing for Testing Deep Learning API
Functions [16.62942039883249]
We use DocTer to analyze API documentation to extract input constraints for API functions of deep learning (DL) libraries.
DocTer features a novel algorithm that automatically constructs rules to extract API parameter constraints from syntactic patterns in the form of dependency parse trees of API descriptions.
Our evaluation on three popular DL libraries shows that the precision of DocTer in extracting input constraints is 85.4%.
arXiv Detail & Related papers (2021-09-02T14:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.