Finding Deep-Learning Compilation Bugs with NNSmith
- URL: http://arxiv.org/abs/2207.13066v1
- Date: Tue, 26 Jul 2022 17:39:51 GMT
- Title: Finding Deep-Learning Compilation Bugs with NNSmith
- Authors: Jiawei Liu, Jinkun Lin, Fabian Ruffy, Cheng Tan, Jinyang Li, Aurojit
Panda, Lingming Zhang
- Abstract summary: We propose a new fuzz testing approach for finding bugs in deep-learning compilers.
Our core approach uses (i) light-weight operator specifications to generate diverse yet valid models, (ii) a gradient-based search process, and (iii) differential testing to identify bugs.
We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM,RT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by maintainers.
- Score: 20.082492391396933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep-learning (DL) compilers such as TVM and TensorRT are increasingly used
to optimize deep neural network (DNN) models to meet performance, resource
utilization and other requirements. Bugs in these compilers can produce
optimized models whose semantics differ from the original models, and produce
incorrect results impacting the correctness of down stream applications.
However, finding bugs in these compilers is challenging due to their
complexity. In this work, we propose a new fuzz testing approach for finding
bugs in deep-learning compilers. Our core approach uses (i) light-weight
operator specifications to generate diverse yet valid DNN models allowing us to
exercise a large part of the compiler's transformation logic; (ii) a
gradient-based search process for finding model inputs that avoid any
floating-point exceptional values during model execution, reducing the chance
of missed bugs or false alarms; and (iii) differential testing to identify
bugs. We implemented this approach in NNSmith which has found 65 new bugs in
the last seven months for TVM, TensorRT, ONNXRuntime, and PyTorch. Of these 52
have been confirmed and 44 have been fixed by project maintainers.
Related papers
- SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists [59.08999823652293]
We propose SYNTHEVAL to generate a wide range of test types for a comprehensive evaluation of NLP models.
In the last stage, human experts investigate the challenging examples, manually design templates, and identify the types of failures the taskspecific models consistently exhibit.
We apply SYNTHEVAL to two classification tasks, sentiment analysis and toxic language detection, and show that our framework is effective in identifying weaknesses of strong models on these tasks.
arXiv Detail & Related papers (2024-08-30T17:41:30Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Do Language Models Learn Semantics of Code? A Case Study in
Vulnerability Detection [7.725755567907359]
We analyze the models using three distinct methods: interpretability tools, attention analysis, and interaction matrix analysis.
We develop two annotation methods which highlight the bug semantics inside the model's inputs.
Our findings indicate that it is helpful to provide the model with information of the bug semantics, that the model can attend to it, and motivate future work in learning more complex path-based bug semantics.
arXiv Detail & Related papers (2023-11-07T16:31:56Z) - NeuRI: Diversifying DNN Generation via Inductive Rule Inference [16.463237407360594]
NeuRI is a fully automated approach for generating valid and diverse Deep Learning models.
NeuRI improves branch coverage of PyTorch by 24% and 15% over the state-of-the-art model-level fuzzers.
arXiv Detail & Related papers (2023-02-04T23:42:07Z) - Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation [20.519361342905775]
We propose Tzer, a practical fuzzing technique for the widely used TVM tensor compiler.
Our results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing.
To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed.
arXiv Detail & Related papers (2022-02-21T01:48:11Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Automatic Fault Detection for Deep Learning Programs Using Graph
Transformations [13.572917264310119]
We propose NeuraLint, a model-based fault detection approach for Deep Learning programs.
NeuraLint effectively detects faults and design issues in both synthesized and real-world examples with a recall of 70.5 % and a precision of 100 %.
Although the proposed meta-model is designed for feedforward neural networks, it can be extended to support other neural network architectures.
arXiv Detail & Related papers (2021-05-17T18:06:11Z) - Recoding latent sentence representations -- Dynamic gradient-based
activation modification in RNNs [0.0]
In RNNs, encoding information in a suboptimal way can impact the quality of representations based on later elements in the sequence.
I propose an augmentation to standard RNNs in form of a gradient-based correction mechanism.
I conduct different experiments in the context of language modeling, where the impact of using such a mechanism is examined in detail.
arXiv Detail & Related papers (2021-01-03T17:54:17Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.