Towards Training Reproducible Deep Learning Models
- URL: http://arxiv.org/abs/2202.02326v1
- Date: Fri, 4 Feb 2022 18:14:39 GMT
- Title: Towards Training Reproducible Deep Learning Models
- Authors: Boyuan Chen, Mingzhi Wen, Yong Shi, Dayi Lin, Gopi Krishnan
Rajbahadur, Zhen Ming (Jack) Jiang
- Abstract summary: Deep Learning (DL) models are challenging to be reproduced due to issues like randomness in the software and non-determinism in the hardware.
This paper proposes a systematic approach to training reproducible DL models.
Case study results show our approach can successfully reproduce six open source and one commercial DL models.
- Score: 26.547756923322126
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reproducibility is an increasing concern in Artificial Intelligence (AI),
particularly in the area of Deep Learning (DL). Being able to reproduce DL
models is crucial for AI-based systems, as it is closely tied to various tasks
like training, testing, debugging, and auditing. However, DL models are
challenging to be reproduced due to issues like randomness in the software
(e.g., DL algorithms) and non-determinism in the hardware (e.g., GPU). There
are various practices to mitigate some of the aforementioned issues. However,
many of them are either too intrusive or can only work for a specific usage
context. In this paper, we propose a systematic approach to training
reproducible DL models. Our approach includes three main parts: (1) a set of
general criteria to thoroughly evaluate the reproducibility of DL models for
two different domains, (2) a unified framework which leverages a
record-and-replay technique to mitigate software-related randomness and a
profile-and-patch technique to control hardware-related non-determinism, and
(3) a reproducibility guideline which explains the rationales and the
mitigation strategies on conducting a reproducible training process for DL
models. Case study results show our approach can successfully reproduce six
open source and one commercial DL models.
Related papers
- Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction [75.25114727856861]
Large language models (LLMs) tend to suffer from deterioration at the latter stage ofSupervised fine-tuning process.
We introduce a simple disperse-then-merge framework to address the issue.
Our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.
arXiv Detail & Related papers (2024-05-22T08:18:19Z) - Having Second Thoughts? Let's hear it [0.36832029288386137]
Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas.
After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted.
We propose a certification process mimicking selective attention and test if it could make DL models more robust.
arXiv Detail & Related papers (2023-11-26T17:17:28Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets.
We investigated model capabilities, training data, and model interpretation.
Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models.
Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning.
Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z) - Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and
Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness.
We combine the SE concept of code complexity with the AI technique of curriculum learning.
We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z) - On the Replicability and Reproducibility of Deep Learning in Software
Engineering [16.828220584270507]
Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years.
Many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness.
They often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) - whether one reported experimental findings can be reproduced by new experiments with the same experimental protocol and DL model, but different sampled real-world data.
arXiv Detail & Related papers (2020-06-25T08:20:10Z) - Improving Deep Learning Models via Constraint-Based Domain Knowledge: a
Brief Survey [11.034875974800487]
This paper presents a first survey of the approaches devised to integrate domain knowledge, expressed in the form of constraints, in Deep Learning (DL) learning models.
We identify five categories that encompass the main approaches to inject domain knowledge: 1) acting on the features space, 2) modifications to the hypothesis space, 3) data augmentation, 4) regularization schemes, 5) constrained learning.
arXiv Detail & Related papers (2020-05-19T15:34:09Z) - Deep Learning for Source Code Modeling and Generation: Models,
Applications and Challenges [5.4052819252055055]
Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast.
Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked.
arXiv Detail & Related papers (2020-02-13T11:02:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.