Synthesizing Irreproducibility in Deep Networks
- URL: http://arxiv.org/abs/2102.10696v1
- Date: Sun, 21 Feb 2021 21:51:28 GMT
- Title: Synthesizing Irreproducibility in Deep Networks
- Authors: Robert R. Snapp and Gil I. Shamir
- Abstract summary: Modern day deep networks suffer from irreproducibility (also referred to as nondeterminism or underspecification)
We show that even with a single nonlinearity and for very simple data and models, irreproducibility occurs.
Model complexity and the choice of nonlinearity also play significant roles in making deep models irreproducible.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success and superior performance of deep networks is spreading their
popularity and use to an increasing number of applications. Very recent works,
however, demonstrate that modern day deep networks suffer from
irreproducibility (also referred to as nondeterminism or underspecification).
Two or more models that are identical in architecture, structure, training
hyper-parameters, and parameters, and that are trained on exactly the same
training data, yield different predictions on individual previously unseen
examples. Thus, a model that performs well on controlled test data, may perform
in unexpected ways when deployed in the real world, whose data is expected to
be similar to the test data. We study simple synthetic models and data to
understand the origins of these problems. We show that even with a single
nonlinearity and for very simple data and models, irreproducibility occurs. Our
study demonstrates the effects of randomness in initialization, training data
shuffling window size, and activation functions on prediction
irreproducibility, even under very controlled synthetic data. While, as one
would expect, randomness in initialization and in shuffling the training
examples exacerbates the phenomenon, we show that model complexity and the
choice of nonlinearity also play significant roles in making deep models
irreproducible.
Related papers
- Strong Model Collapse [16.071600606637908]
We consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon.
Our results show that even the smallest fraction of synthetic data can lead to model collapse.
We investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse.
arXiv Detail & Related papers (2024-10-07T08:54:23Z) - Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences [20.629333587044012]
We study the impact of data curation on iterated retraining of generative models.
We prove that, if the data is curated according to a reward model, the expected reward of the iterative retraining procedure is maximized.
arXiv Detail & Related papers (2024-06-12T21:28:28Z) - How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse [9.59833542807268]
Model collapse occurs when new models are trained on synthetic data generated from previously trained models.
We show that model collapse cannot be avoided when training solely on synthetic data.
We estimate a maximal amount of synthetic data below which model collapse can eventually be avoided.
arXiv Detail & Related papers (2024-04-07T22:15:13Z) - Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data [49.73114504515852]
We show that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse.
We demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse.
arXiv Detail & Related papers (2024-04-01T18:31:24Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Transfer learning suppresses simulation bias in predictive models built
from sparse, multi-modal data [15.587831925516957]
Many problems in science, engineering, and business require making predictions based on very few observations.
To build a robust predictive model, these sparse data may need to be augmented with simulated data, especially when the design space is multidimensional.
We combine recent developments in deep learning to build more robust predictive models from multimodal data.
arXiv Detail & Related papers (2021-04-19T23:28:32Z) - Forecasting Industrial Aging Processes with Machine Learning Methods [0.0]
We evaluate a wider range of data-driven models, comparing some traditional stateless models to more complex recurrent neural networks.
Our results show that recurrent models produce near perfect predictions when trained on larger datasets.
arXiv Detail & Related papers (2020-02-05T13:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.