MEMO: Coverage-guided Model Generation For Deep Learning Library Testing
- URL: http://arxiv.org/abs/2208.01508v1
- Date: Tue, 2 Aug 2022 14:53:02 GMT
- Title: MEMO: Coverage-guided Model Generation For Deep Learning Library Testing
- Authors: Meiziniu Li, Jialun Cao, Yongqiang Tian, Tsz On Li, Ming Wen,
Shing-Chi Cheung
- Abstract summary: A few techniques have been proposed to test deep learning (DL) libraries by generating DL models as test inputs.
But the test effectiveness of these techniques is constrained by the diversity of generated DL models.
We propose MEMO to efficiently generate diverse DL models by exploring layer types, layer pairs, and layer parameters.
- Score: 11.263121366956726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent deep learning (DL) applications are mostly built on top of DL
libraries. The quality assurance of these libraries is critical to the
dependable deployment of DL applications. A few techniques have thereby been
proposed to test DL libraries by generating DL models as test inputs. Then
these techniques feed those DL models to DL libraries for making inferences, in
order to exercise DL libraries modules related to a DL model's execution.
However, the test effectiveness of these techniques is constrained by the
diversity of generated DL models. Our investigation finds that these techniques
can cover at most 11.7% of layer pairs (i.e., call sequence between two layer
APIs) and 55.8% of layer parameters (e.g., "padding" in Conv2D). As a result,
we find that many bugs arising from specific layer pairs and parameters can be
missed by existing techniques.
In view of the limitations of existing DL library testing techniques, we
propose MEMO to efficiently generate diverse DL models by exploring layer
types, layer pairs, and layer parameters. MEMO: (1) designs an initial model
reduction technique to boost test efficiency without compromising model
diversity; and (2) designs a set of mutation operators for a customized Markov
Chain Monte Carlo (MCMC) algorithm to explore new layer types, layer pairs, and
layer parameters. We evaluate MEMO on seven popular DL libraries, including
four for model execution (TensorFlow, PyTorch and MXNet, and ONNX) and three
for model conversions (Keras-MXNet, TF2ONNX, ONNX2PyTorch). The evaluation
result shows that MEMO outperforms recent works by covering 10.3% more layer
pairs, 15.3% more layer parameters, and 2.3% library branches. Moreover, MEMO
detects 29 new bugs in the latest version of DL libraries, with 17 of them
confirmed by DL library developers, and 5 of those confirmed bugs have been
fixed.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - A Tale of Two DL Cities: When Library Tests Meet Compiler [12.751626834965231]
We propose OPERA to extract domain knowledge from the test inputs for DL libraries.
OPERA constructs diverse tests from the various test inputs for DL libraries.
It incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs.
arXiv Detail & Related papers (2024-07-23T16:35:45Z) - DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis [8.779035160734523]
Testing is a major approach to ensuring the quality of deep learning (DL) libraries.
Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction.
This paper introduces thatens, a novel differential testing technique for DL library testing.
arXiv Detail & Related papers (2024-06-12T07:06:38Z) - MoCo: Fuzzing Deep Learning Libraries via Assembling Code [13.937180393991616]
Deep learning techniques have been applied in software systems with various application scenarios.
DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts.
We propose MoCo, a novel fuzzing testing method for DL libraries via assembling code.
arXiv Detail & Related papers (2024-05-13T13:40:55Z) - A Survey of Deep Learning Library Testing Methods [33.62859142913532]
Deep learning (DL) libraries undertake the underlying optimization and computation.
DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety.
This paper provides an overview of the testing research related to various DL libraries.
arXiv Detail & Related papers (2024-04-27T11:42:13Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - XGen-7B Technical Report [138.71625147048377]
XGen is a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens.
We open-source our models for both research advancements and commercial applications.
arXiv Detail & Related papers (2023-09-07T02:20:03Z) - Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning
in Encrypted Traffic Classification [68.19713459228369]
We compare transfer learning, meta-learning and contrastive learning against reference Machine Learning (ML) tree-based and monolithic DL models.
We show that (i) using large datasets we can obtain more general representations, (ii) contrastive learning is the best methodology.
While ML tree-based cannot handle large tasks but fits well small tasks, by means of reusing learned representations, DL methods are reaching tree-based models performance also for small tasks.
arXiv Detail & Related papers (2023-05-21T11:20:49Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem [3.0307714495180895]
This paper analyzes failures in deep learning (DL) model converters.
We survey software engineers about DL interoperability tools, use cases, and pain points.
We find that the node conversion stage of a model converter accounts for 75% of the defects and 33% of reported failure are related to semantically incorrect models.
arXiv Detail & Related papers (2023-03-30T21:00:38Z) - MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down
Distillation [153.56211546576978]
In this work, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator.
We can employ the meta-learning technique to optimize this label generator.
The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012.
arXiv Detail & Related papers (2020-08-27T13:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.