Related papers: From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models

From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models

URL: http://arxiv.org/abs/2310.16695v1
Date: Wed, 25 Oct 2023 15:06:32 GMT
Title: From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models
Authors: Christian Harder, Moritz Fuchs, Yuri Tolkach, Anirban Mukhopadhyay
Abstract summary: In this paper, we introduce two groups of new initialisation methods. First, we locally initialise weight groups by employing variational autoencoders. Secondly, we globally initialise full weight sets by employing graph hypernetworks.
Score: 1.1807848705528714
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Traditional initialisation methods, e.g. He and Xavier, have been effective in avoiding the problem of vanishing or exploding gradients in neural networks. However, they only use simple pointwise distributions, which model one-dimensional variables. Moreover, they ignore most information about the architecture and disregard past training experiences. These limitations can be overcome by employing generative models for initialisation. In this paper, we introduce two groups of new initialisation methods. First, we locally initialise weight groups by employing variational autoencoders. Secondly, we globally initialise full weight sets by employing graph hypernetworks. We thoroughly evaluate the impact of the employed generative models on state-of-the-art neural networks in terms of accuracy, convergence speed and ensembling. Our results show that global initialisations result in higher accuracy and faster initial convergence speed. However, the implementation through graph hypernetworks leads to diminished ensemble performance on out of distribution data. To counteract, we propose a modification called noise graph hypernetwork, which encourages diversity in the produced ensemble members. Furthermore, our approach might be able to transfer learned knowledge to different image distributions. Our work provides insights into the potential, the trade-offs and possible modifications of these new initialisation methods.

Related papers

A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization [15.8696301825572]
Continuously-trained deep neural networks (DNNs) must rapidly learn new concepts while preserving and utilizing prior knowledge. Weights for newly encountered categories are typically randomly, leading to high initial training loss (spikes) and instability. Inspired by Neural Collapse (NC), we propose a weight initialization strategy to improve learning efficiency in CL.
arXiv Detail & Related papers (2025-03-09T01:44:22Z)
Reducing Oversmoothing through Informed Weight Initialization in Graph Neural Networks [16.745718346575202]
We propose a new scheme (G-Init) that reduces oversmoothing, leading to very good results in node and graph classification tasks. Our results indicate that the new method (G-Init) reduces oversmoothing in deep GNNs, facilitating their effective use.
arXiv Detail & Related papers (2024-10-31T11:21:20Z)
Adapt & Align: Continual Learning with Generative Models Latent Space Alignment [15.729732755625474]
We introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models. Neural Networks suffer from abrupt loss in performance when retrained with additional data. We propose a new method that mitigates those problems by employing generative models and splitting the process of their update into two parts.
arXiv Detail & Related papers (2023-12-21T10:02:17Z)
On the Initialization of Graph Neural Networks [10.153841274798829]
We analyze the variance of forward and backward propagation across Graph Neural Networks layers. We propose a new method for Variance Instability Reduction within GNN Optimization (Virgo) We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance.
arXiv Detail & Related papers (2023-12-05T09:55:49Z)
An Initialization Schema for Neuronal Networks on Tabular Data [0.9155684383461983]
We show that a binomial neural network can be used effectively on tabular data. The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks. We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches.
arXiv Detail & Related papers (2023-11-07T13:52:35Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
On the effectiveness of partial variance reduction in federated learning with heterogeneous data [27.527995694042506]
We show that the diversity of the final classification layers across clients impedes the performance of the FedAvg algorithm. Motivated by this, we propose to correct model by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost.
arXiv Detail & Related papers (2022-12-05T11:56:35Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
On the Reproducibility of Neural Network Predictions [52.47827424679645]
We study the problem of churn, identify factors that cause it, and propose two simple means of mitigating it. We first demonstrate that churn is indeed an issue, even for standard image classification tasks. We propose using emphminimum entropy regularizers to increase prediction confidences. We present empirical results showing the effectiveness of both techniques in reducing churn while improving the accuracy of the underlying model.
arXiv Detail & Related papers (2021-02-05T18:51:01Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.