Related papers: ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

URL: http://arxiv.org/abs/2305.17914v2
Date: Sun, 4 Jun 2023 04:01:26 GMT
Title: ACETest: Automated Constraint Extraction for Testing Deep Learning Operators
Authors: Jingyi Shi, Yang Xiao, Yuekang Li, Yeting Li, Dongsong Yu, Chendong Yu, Hui Su, Yufeng Chen, Wei Huo
Abstract summary: It is essential that the test cases pass the input validity check and are able to reach the core function logic of the operators. Existing techniques rely on either human effort or documentation of DL library APIs to extract the constraints. We propose ACETest, a technique to automatically extract input validation constraints from the code to build valid yet diverse test cases.
Score: 23.129431525952263
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning (DL) applications are prevalent nowadays as they can help with multiple tasks. DL libraries are essential for building DL applications. Furthermore, DL operators are the important building blocks of the DL libraries, that compute the multi-dimensional data (tensors). Therefore, bugs in DL operators can have great impacts. Testing is a practical approach for detecting bugs in DL operators. In order to test DL operators effectively, it is essential that the test cases pass the input validity check and are able to reach the core function logic of the operators. Hence, extracting the input validation constraints is required for generating high-quality test cases. Existing techniques rely on either human effort or documentation of DL library APIs to extract the constraints. They cannot extract complex constraints and the extracted constraints may differ from the actual code implementation. To address the challenge, we propose ACETest, a technique to automatically extract input validation constraints from the code to build valid yet diverse test cases which can effectively unveil bugs in the core function logic of DL operators. For this purpose, ACETest can automatically identify the input validation code in DL operators, extract the related constraints and generate test cases according to the constraints. The experimental results on popular DL libraries, TensorFlow and PyTorch, demonstrate that ACETest can extract constraints with higher quality than state-of-the-art (SOTA) techniques. Moreover, ACETest is capable of extracting 96.4% more constraints and detecting 1.95 to 55 times more bugs than SOTA techniques. In total, we have used ACETest to detect 108 previously unknown bugs on TensorFlow and PyTorch, with 87 of them confirmed by the developers. Lastly, five of the bugs were assigned with CVE IDs due to their security impacts.

Related papers

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding [49.56049319037421]
KodCode is a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data. It comprises question-solution-test triplets that are systematically validated via a self-verification procedure. This pipeline yields a large-scale, robust and diverse coding dataset.
arXiv Detail & Related papers (2025-03-04T19:17:36Z)
AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL [46.65963514391019]
AutoRestTest is a novel tool that integrates the Semantic Property Dependency Graph (SPDG) with Multi-Agent Reinforcement Learning (MARL) and large language models (LLMs) for effective REST API testing.
arXiv Detail & Related papers (2025-01-15T05:54:33Z)
LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
We present LlamaRestTest, a novel approach that employs two custom Large Language Models (LLMs) to generate realistic test inputs. We evaluate it against several state-of-the-art REST API testing tools, including RESTGPT, a GPT-powered specification-enhancement tool. Our study shows that small language models can perform as well as, or better than, large language models in REST API testing.
arXiv Detail & Related papers (2025-01-15T05:51:20Z)
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors. Traditional fuzzing struggles with the complexity and API diversity of DL libraries. We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z)
Subgraph-Oriented Testing for Deep Learning Libraries [9.78188667672054]
We propose SORT (Subgraph-Oriented Realistic Testing) to test Deep Learning (DL) libraries on different hardware platforms. SORT takes popular API interaction patterns, represented as frequent subgraphs of model graphs, as test subjects. SORT achieves a 100% valid input generation rate, detects more precision bugs than existing methods, and reveals interaction-related bugs missed by single-API testing.
arXiv Detail & Related papers (2024-12-09T12:10:48Z)
A Tale of Two DL Cities: When Library Tests Meet Compiler [12.751626834965231]
We propose OPERA to extract domain knowledge from the test inputs for DL libraries. OPERA constructs diverse tests from the various test inputs for DL libraries. It incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs.
arXiv Detail & Related papers (2024-07-23T16:35:45Z)
CITADEL: Context Similarity Based Deep Learning Framework Bug Finding [36.34154201748415]
Existing deep learning (DL) framework testing tools have limited coverage on bug types. We propose Citadel, a method that accelerates the finding of bugs in terms of efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-18T01:51:16Z)
DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis [8.779035160734523]
Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. This paper introduces thatens, a novel differential testing technique for DL library testing.
arXiv Detail & Related papers (2024-06-12T07:06:38Z)
MoCo: Fuzzing Deep Learning Libraries via Assembling Code [13.937180393991616]
Deep learning techniques have been applied in software systems with various application scenarios. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts. We propose MoCo, a novel fuzzing testing method for DL libraries via assembling code.
arXiv Detail & Related papers (2024-05-13T13:40:55Z)
Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings. We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z)
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data. We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch. Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z)
Test-Time Self-Adaptive Small Language Models for Question Answering [63.91013329169796]
We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data. Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
arXiv Detail & Related papers (2023-10-20T06:49:32Z)
CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases. CodeT executes the code solutions using the generated test cases, and then chooses the best solution. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z)
Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability. We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z)
DocTer: Documentation Guided Fuzzing for Testing Deep Learning API Functions [16.62942039883249]
We use DocTer to analyze API documentation to extract input constraints for API functions of deep learning (DL) libraries. DocTer features a novel algorithm that automatically constructs rules to extract API parameter constraints from syntactic patterns in the form of dependency parse trees of API descriptions. Our evaluation on three popular DL libraries shows that the precision of DocTer in extracting input constraints is 85.4%.
arXiv Detail & Related papers (2021-09-02T14:57:36Z)
Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation. In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.