ACETest: Automated Constraint Extraction for Testing Deep Learning
Operators
- URL: http://arxiv.org/abs/2305.17914v2
- Date: Sun, 4 Jun 2023 04:01:26 GMT
- Title: ACETest: Automated Constraint Extraction for Testing Deep Learning
Operators
- Authors: Jingyi Shi, Yang Xiao, Yuekang Li, Yeting Li, Dongsong Yu, Chendong
Yu, Hui Su, Yufeng Chen, Wei Huo
- Abstract summary: It is essential that the test cases pass the input validity check and are able to reach the core function logic of the operators.
Existing techniques rely on either human effort or documentation of DL library APIs to extract the constraints.
We propose ACETest, a technique to automatically extract input validation constraints from the code to build valid yet diverse test cases.
- Score: 23.129431525952263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning (DL) applications are prevalent nowadays as they can help with
multiple tasks. DL libraries are essential for building DL applications.
Furthermore, DL operators are the important building blocks of the DL
libraries, that compute the multi-dimensional data (tensors). Therefore, bugs
in DL operators can have great impacts. Testing is a practical approach for
detecting bugs in DL operators. In order to test DL operators effectively, it
is essential that the test cases pass the input validity check and are able to
reach the core function logic of the operators. Hence, extracting the input
validation constraints is required for generating high-quality test cases.
Existing techniques rely on either human effort or documentation of DL library
APIs to extract the constraints. They cannot extract complex constraints and
the extracted constraints may differ from the actual code implementation.
To address the challenge, we propose ACETest, a technique to automatically
extract input validation constraints from the code to build valid yet diverse
test cases which can effectively unveil bugs in the core function logic of DL
operators. For this purpose, ACETest can automatically identify the input
validation code in DL operators, extract the related constraints and generate
test cases according to the constraints. The experimental results on popular DL
libraries, TensorFlow and PyTorch, demonstrate that ACETest can extract
constraints with higher quality than state-of-the-art (SOTA) techniques.
Moreover, ACETest is capable of extracting 96.4% more constraints and detecting
1.95 to 55 times more bugs than SOTA techniques. In total, we have used ACETest
to detect 108 previously unknown bugs on TensorFlow and PyTorch, with 87 of
them confirmed by the developers. Lastly, five of the bugs were assigned with
CVE IDs due to their security impacts.
Related papers
- A Tale of Two DL Cities: When Library Tests Meet Compiler [12.751626834965231]
We propose OPERA to extract domain knowledge from the test inputs for DL libraries.
OPERA constructs diverse tests from the various test inputs for DL libraries.
It incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs.
arXiv Detail & Related papers (2024-07-23T16:35:45Z) - CITADEL: Context Similarity Based Deep Learning Framework Bug Finding [36.34154201748415]
Existing deep learning (DL) framework testing tools have limited coverage on bug types.
We propose Citadel, a method that accelerates the finding of bugs in terms of efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-18T01:51:16Z) - DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis [8.779035160734523]
Testing is a major approach to ensuring the quality of deep learning (DL) libraries.
Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction.
This paper introduces thatens, a novel differential testing technique for DL library testing.
arXiv Detail & Related papers (2024-06-12T07:06:38Z) - MoCo: Fuzzing Deep Learning Libraries via Assembling Code [13.937180393991616]
Deep learning techniques have been applied in software systems with various application scenarios.
DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts.
We propose MoCo, a novel fuzzing testing method for DL libraries via assembling code.
arXiv Detail & Related papers (2024-05-13T13:40:55Z) - Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - Test-Time Self-Adaptive Small Language Models for Question Answering [63.91013329169796]
We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data.
Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
arXiv Detail & Related papers (2023-10-20T06:49:32Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability.
We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z) - DocTer: Documentation Guided Fuzzing for Testing Deep Learning API
Functions [16.62942039883249]
We use DocTer to analyze API documentation to extract input constraints for API functions of deep learning (DL) libraries.
DocTer features a novel algorithm that automatically constructs rules to extract API parameter constraints from syntactic patterns in the form of dependency parse trees of API descriptions.
Our evaluation on three popular DL libraries shows that the precision of DocTer in extracting input constraints is 85.4%.
arXiv Detail & Related papers (2021-09-02T14:57:36Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.