Related papers: Towards Training Reproducible Deep Learning Models

Towards Training Reproducible Deep Learning Models

URL: http://arxiv.org/abs/2202.02326v1
Date: Fri, 4 Feb 2022 18:14:39 GMT
Title: Towards Training Reproducible Deep Learning Models
Authors: Boyuan Chen, Mingzhi Wen, Yong Shi, Dayi Lin, Gopi Krishnan Rajbahadur, Zhen Ming (Jack) Jiang
Abstract summary: Deep Learning (DL) models are challenging to be reproduced due to issues like randomness in the software and non-determinism in the hardware. This paper proposes a systematic approach to training reproducible DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.
Score: 26.547756923322126
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reproducibility is an increasing concern in Artificial Intelligence (AI), particularly in the area of Deep Learning (DL). Being able to reproduce DL models is crucial for AI-based systems, as it is closely tied to various tasks like training, testing, debugging, and auditing. However, DL models are challenging to be reproduced due to issues like randomness in the software (e.g., DL algorithms) and non-determinism in the hardware (e.g., GPU). There are various practices to mitigate some of the aforementioned issues. However, many of them are either too intrusive or can only work for a specific usage context. In this paper, we propose a systematic approach to training reproducible DL models. Our approach includes three main parts: (1) a set of general criteria to thoroughly evaluate the reproducibility of DL models for two different domains, (2) a unified framework which leverages a record-and-replay technique to mitigate software-related randomness and a profile-and-patch technique to control hardware-related non-determinism, and (3) a reproducibility guideline which explains the rationales and the mitigation strategies on conducting a reproducible training process for DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.

Related papers

Two out of Three (ToT): using self-consistency to make robust predictions [1.7314342339585087]
We develop an algorithm that allows deep learning models to abstain from answering when they are uncertain.<n>Our algorithm, named Two out of Three (ToT)', is inspired by the sensitivity of the human brain to conflicting information.
arXiv Detail & Related papers (2025-05-19T02:50:19Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction [75.25114727856861]
Large language models (LLMs) tend to suffer from deterioration at the latter stage ofSupervised fine-tuning process. We introduce a simple disperse-then-merge framework to address the issue. Our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.
arXiv Detail & Related papers (2024-05-22T08:18:19Z)
Having Second Thoughts? Let's hear it [0.36832029288386137]
Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. We propose a certification process mimicking selective attention and test if it could make DL models more robust.
arXiv Detail & Related papers (2023-11-26T17:17:28Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets. We investigated model capabilities, training data, and model interpretation. Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z)
REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition. We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting. We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z)
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems. It exploits the combination of reinforcement learning and latent variable generative models. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z)
Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models. Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z)
Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness. We combine the SE concept of code complexity with the AI technique of curriculum learning. We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z)
On the Replicability and Reproducibility of Deep Learning in Software Engineering [16.828220584270507]
Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. Many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness. They often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) - whether one reported experimental findings can be reproduced by new experiments with the same experimental protocol and DL model, but different sampled real-world data.
arXiv Detail & Related papers (2020-06-25T08:20:10Z)
Improving Deep Learning Models via Constraint-Based Domain Knowledge: a Brief Survey [11.034875974800487]
This paper presents a first survey of the approaches devised to integrate domain knowledge, expressed in the form of constraints, in Deep Learning (DL) learning models. We identify five categories that encompass the main approaches to inject domain knowledge: 1) acting on the features space, 2) modifications to the hypothesis space, 3) data augmentation, 4) regularization schemes, 5) constrained learning.
arXiv Detail & Related papers (2020-05-19T15:34:09Z)
Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges [5.4052819252055055]
Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked.
arXiv Detail & Related papers (2020-02-13T11:02:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.