Related papers: Finding Deep-Learning Compilation Bugs with NNSmith

Finding Deep-Learning Compilation Bugs with NNSmith

URL: http://arxiv.org/abs/2207.13066v1
Date: Tue, 26 Jul 2022 17:39:51 GMT
Title: Finding Deep-Learning Compilation Bugs with NNSmith
Authors: Jiawei Liu, Jinkun Lin, Fabian Ruffy, Cheng Tan, Jinyang Li, Aurojit Panda, Lingming Zhang
Abstract summary: We propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach uses (i) light-weight operator specifications to generate diverse yet valid models, (ii) a gradient-based search process, and (iii) differential testing to identify bugs. We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM,RT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by maintainers.
Score: 20.082492391396933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep-learning (DL) compilers such as TVM and TensorRT are increasingly used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can produce optimized models whose semantics differ from the original models, and produce incorrect results impacting the correctness of down stream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach uses (i) light-weight operator specifications to generate diverse yet valid DNN models allowing us to exercise a large part of the compiler's transformation logic; (ii) a gradient-based search process for finding model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) differential testing to identify bugs. We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM, TensorRT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by project maintainers.

Related papers

Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model [8.995812770349602]
We propose AsserT5, a new model based on the pre-trained CodeT5 model. We find that the abstraction and the inclusion of the focal method are useful also for a fine-tuned pre-trained model.
arXiv Detail & Related papers (2025-02-04T20:42:22Z)
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists [59.08999823652293]
We propose SYNTHEVAL to generate a wide range of test types for a comprehensive evaluation of NLP models. In the last stage, human experts investigate the challenging examples, manually design templates, and identify the types of failures the taskspecific models consistently exhibit. We apply SYNTHEVAL to two classification tasks, sentiment analysis and toxic language detection, and show that our framework is effective in identifying weaknesses of strong models on these tasks.
arXiv Detail & Related papers (2024-08-30T17:41:30Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection [7.725755567907359]
We analyze the models using three distinct methods: interpretability tools, attention analysis, and interaction matrix analysis. We develop two annotation methods which highlight the bug semantics inside the model's inputs. Our findings indicate that it is helpful to provide the model with information of the bug semantics, that the model can attend to it, and motivate future work in learning more complex path-based bug semantics.
arXiv Detail & Related papers (2023-11-07T16:31:56Z)
NeuRI: Diversifying DNN Generation via Inductive Rule Inference [16.463237407360594]
NeuRI is a fully automated approach for generating valid and diverse Deep Learning models. NeuRI improves branch coverage of PyTorch by 24% and 15% over the state-of-the-art model-level fuzzers.
arXiv Detail & Related papers (2023-02-04T23:42:07Z)
Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation [20.519361342905775]
We propose Tzer, a practical fuzzing technique for the widely used TVM tensor compiler. Our results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing. To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed.
arXiv Detail & Related papers (2022-02-21T01:48:11Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
Automatic Fault Detection for Deep Learning Programs Using Graph Transformations [13.572917264310119]
We propose NeuraLint, a model-based fault detection approach for Deep Learning programs. NeuraLint effectively detects faults and design issues in both synthesized and real-world examples with a recall of 70.5 % and a precision of 100 %. Although the proposed meta-model is designed for feedforward neural networks, it can be extended to support other neural network architectures.
arXiv Detail & Related papers (2021-05-17T18:06:11Z)
Recoding latent sentence representations -- Dynamic gradient-based activation modification in RNNs [0.0]
In RNNs, encoding information in a suboptimal way can impact the quality of representations based on later elements in the sequence. I propose an augmentation to standard RNNs in form of a gradient-based correction mechanism. I conduct different experiments in the context of language modeling, where the impact of using such a mechanism is examined in detail.
arXiv Detail & Related papers (2021-01-03T17:54:17Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.