Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem
- URL: http://arxiv.org/abs/2303.17708v4
- Date: Mon, 2 Sep 2024 15:23:52 GMT
- Title: Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem
- Authors: Purvish Jajal, Wenxin Jiang, Arav Tewari, Erik Kocinare, Joseph Woo, Anusha Sarraf, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis,
- Abstract summary: This paper analyzes failures in deep learning (DL) model converters.
We survey software engineers about DL interoperability tools, use cases, and pain points.
We find that the node conversion stage of a model converter accounts for 75% of the defects and 33% of reported failure are related to semantically incorrect models.
- Score: 3.0307714495180895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics could be fruitful.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Identifying and Mitigating Model Failures through Few-shot CLIP-aided
Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations.
These descriptions can be used to generate synthetic data using generative models, such as diffusion models.
Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z) - MGit: A Model Versioning and Management System [7.2678752235785735]
MGit is a model versioning and management system that makes it easier to store, test, update, and collaborate on model derivatives.
MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.
arXiv Detail & Related papers (2023-07-14T17:56:48Z) - An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets.
We investigated model capabilities, training data, and model interpretation.
Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z) - An Empirical Study of Challenges in Converting Deep Learning Models [15.521925194920893]
We conduct the first empirical study to assess ONNX and CoreML for converting trained Deep Learning models.
Our results reveal that the prediction accuracy of converted models are at the same level of originals.
Converted models are generally assessed as robust at the same level of originals.
arXiv Detail & Related papers (2022-06-28T23:18:37Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - Characterizing and Understanding the Behavior of Quantized Models for
Reliable Deployment [32.01355605506855]
Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training.
Disagreements often have closer top-1 and top-2 output probabilities, and $Margin$ is a better indicator than the other uncertainty metrics to distinguish disagreements.
We opensource our code and models as a new benchmark for further studying the quantized models.
arXiv Detail & Related papers (2022-04-08T11:19:16Z) - Integrated Training for Sequence-to-Sequence Models Using
Non-Autoregressive Transformer [49.897891031932545]
We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation.
We conduct an evaluation on two pivot-based machine translation tasks, namely French-German and German-Czech.
arXiv Detail & Related papers (2021-09-27T11:04:09Z) - DirectDebug: Automated Testing and Debugging of Feature Models [55.41644538483948]
Variability models (e.g., feature models) are a common way for the representation of variabilities and commonalities of software artifacts.
Complex and often large-scale feature models can become faulty, i.e., do not represent the expected variability properties of the underlying software artifact.
arXiv Detail & Related papers (2021-02-11T11:22:20Z) - An Empirical Analysis of Backward Compatibility in Machine Learning
Systems [47.04803977692586]
We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users.
For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior.
arXiv Detail & Related papers (2020-08-11T08:10:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.