CrossedWires: A Dataset of Syntactically Equivalent but Semantically
Disparate Deep Learning Models
- URL: http://arxiv.org/abs/2108.12768v1
- Date: Sun, 29 Aug 2021 07:27:16 GMT
- Title: CrossedWires: A Dataset of Syntactically Equivalent but Semantically
Disparate Deep Learning Models
- Authors: Max Zvyagin, Thomas Brettin, Arvind Ramanathan, Sumit Kumar Jha
- Abstract summary: We present a living dataset of models and hyperparameters, called CrossedWires, that exposes semantic differences between two popular deep learning frameworks: PyTorch and Denseflow.
The CrossedWires dataset provides an opportunity to study semantic differences between syntactically equivalent models across popular deep learning frameworks.
- Score: 4.487528701954606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The training of neural networks using different deep learning frameworks may
lead to drastically differing accuracy levels despite the use of the same
neural network architecture and identical training hyperparameters such as
learning rate and choice of optimization algorithms. Currently, our ability to
build standardized deep learning models is limited by the availability of a
suite of neural network and corresponding training hyperparameter benchmarks
that expose differences between existing deep learning frameworks. In this
paper, we present a living dataset of models and hyperparameters, called
CrossedWires, that exposes semantic differences between two popular deep
learning frameworks: PyTorch and Tensorflow. The CrossedWires dataset currently
consists of models trained on CIFAR10 images using three different computer
vision architectures: VGG16, ResNet50 and DenseNet121 across a large
hyperparameter space. Using hyperparameter optimization, each of the three
models was trained on 400 sets of hyperparameters suggested by the HyperSpace
search algorithm. The CrossedWires dataset includes PyTorch and Tensforflow
models with test accuracies as different as 0.681 on syntactically equivalent
models and identical hyperparameter choices. The 340 GB dataset and benchmarks
presented here include the performance statistics, training curves, and model
weights for all 1200 hyperparameter choices, resulting in 2400 total models.
The CrossedWires dataset provides an opportunity to study semantic differences
between syntactically equivalent models across popular deep learning
frameworks. Further, the insights obtained from this study can enable the
development of algorithms and tools that improve reliability and
reproducibility of deep learning frameworks. The dataset is freely available at
https://github.com/maxzvyagin/crossedwires through a Python API and direct
download link.
Related papers
- Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z) - Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [62.467716468917224]
We propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it.
Our method transfers knowledge about the performance of many pretrained models on a series of datasets.
We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset.
arXiv Detail & Related papers (2023-06-06T16:15:26Z) - VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data.
We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data.
We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Investigating the Relationship Between Dropout Regularization and Model
Complexity in Neural Networks [0.0]
Dropout Regularization serves to reduce variance in Deep Learning models.
We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks.
We build neural networks that predict the optimal dropout rate given the number of hidden units in each dense layer.
arXiv Detail & Related papers (2021-08-14T23:49:33Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with
Autotuned Data-Parallel Training for Tabular Data [11.552769149674544]
Development of high-performing predictive models for large data sets is a challenging task.
Recent automated machine learning (AutoML) is emerging as a promising approach to automate predictive model development.
We have developed AgEBO-Tabular, an approach to combine aging evolution (AgE) and a parallel NAS method that searches over neural architecture space.
arXiv Detail & Related papers (2020-10-30T16:28:48Z) - It's the Best Only When It Fits You Most: Finding Related Models for
Serving Based on Dynamic Locality Sensitive Hashing [1.581913948762905]
Preparation of training data is often a bottleneck in the lifecycle of deploying a deep learning model for production or research.
This paper proposes an end-to-end process of searching related models for serving based on the similarity of the target dataset and the training datasets of the available models.
arXiv Detail & Related papers (2020-10-13T22:52:13Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.