NeuRI: Diversifying DNN Generation via Inductive Rule Inference
- URL: http://arxiv.org/abs/2302.02261v3
- Date: Mon, 4 Sep 2023 16:55:51 GMT
- Title: NeuRI: Diversifying DNN Generation via Inductive Rule Inference
- Authors: Jiawei Liu, Jinjun Peng, Yuyao Wang, Lingming Zhang
- Abstract summary: NeuRI is a fully automated approach for generating valid and diverse Deep Learning models.
NeuRI improves branch coverage of PyTorch by 24% and 15% over the state-of-the-art model-level fuzzers.
- Score: 16.463237407360594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning (DL) is prevalently used in various industries to improve
decision-making and automate processes, driven by the ever-evolving DL
libraries and compilers. The correctness of DL systems is crucial for trust in
DL applications. As such, the recent wave of research has been studying the
automated synthesis of test-cases (i.e., DNN models and their inputs) for
fuzzing DL systems. However, existing model generators only subsume a limited
number of operators, lacking the ability to pervasively model operator
constraints. To address this challenge, we propose NeuRI, a fully automated
approach for generating valid and diverse DL models composed of hundreds of
types of operators. NeuRI adopts a three-step process: (i) collecting valid and
invalid API traces from various sources; (ii) applying inductive program
synthesis over the traces to infer the constraints for constructing valid
models; and (iii) using hybrid model generation which incorporates both
symbolic and concrete operators. Our evaluation shows that NeuRI improves
branch coverage of TensorFlow and PyTorch by 24% and 15% over the
state-of-the-art model-level fuzzers. NeuRI finds 100 new bugs for PyTorch and
TensorFlow in four months, with 81 already fixed or confirmed. Of these, 9 bugs
are labelled as high priority or security vulnerability, constituting 10% of
all high-priority bugs of the period. Open-source developers regard
error-inducing tests reported by us as "high-quality" and "common in practice".
Related papers
- Learning Better Representations From Less Data For Propositional Satisfiability [7.449724123186386]
We present NeuRes, a neuro-symbolic approach to address both challenges for propositional satisfiability.
Our model learns better representations than models trained for classification only, with a much higher data efficiency.
We show that our model achieves far better performance than NeuroSAT in terms of both correctly classified and proven instances.
arXiv Detail & Related papers (2024-02-13T10:50:54Z) - Language Models for Novelty Detection in System Call Traces [0.27309692684728604]
This paper introduces a novelty detection methodology that relies on a probability distribution over sequences of system calls.
The proposed methodology requires minimal expert hand-crafting and achieves an F-score and AuROC greater than 95% on most novelties.
The source code and trained models are publicly available on GitHub while the datasets are available on Zenodo.
arXiv Detail & Related papers (2023-09-05T13:11:40Z) - Neural Abstractions [72.42530499990028]
We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics.
We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models.
arXiv Detail & Related papers (2023-01-27T12:38:09Z) - Finding Deep-Learning Compilation Bugs with NNSmith [20.082492391396933]
We propose a new fuzz testing approach for finding bugs in deep-learning compilers.
Our core approach uses (i) light-weight operator specifications to generate diverse yet valid models, (ii) a gradient-based search process, and (iii) differential testing to identify bugs.
We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM,RT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by maintainers.
arXiv Detail & Related papers (2022-07-26T17:39:51Z) - Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation [20.519361342905775]
We propose Tzer, a practical fuzzing technique for the widely used TVM tensor compiler.
Our results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing.
To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed.
arXiv Detail & Related papers (2022-02-21T01:48:11Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Automatic Fault Detection for Deep Learning Programs Using Graph
Transformations [13.572917264310119]
We propose NeuraLint, a model-based fault detection approach for Deep Learning programs.
NeuraLint effectively detects faults and design issues in both synthesized and real-world examples with a recall of 70.5 % and a precision of 100 %.
Although the proposed meta-model is designed for feedforward neural networks, it can be extended to support other neural network architectures.
arXiv Detail & Related papers (2021-05-17T18:06:11Z) - TACRED Revisited: A Thorough Evaluation of the TACRED Relation
Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE)
In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement?
We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z) - Learning to Encode Position for Transformer with Continuous Dynamical
Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models.
We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.