Related papers: Principled Weight Initialisation for Input-Convex Neural Networks

Principled Weight Initialisation for Input-Convex Neural Networks

URL: http://arxiv.org/abs/2312.12474v1
Date: Tue, 19 Dec 2023 10:36:12 GMT
Title: Principled Weight Initialisation for Input-Convex Neural Networks
Authors: Pieter-Jan Hoedt and G\"unter Klambauer
Abstract summary: Input-Convex Neural Networks (ICNNs) guarantee convexity in their input-output mapping. Previous initialisation strategies, which implicitly assume centred weights, are not effective for ICNNs. We show that our principled initialisation effectively accelerates learning in ICNNs and leads to better generalisation.
Score: 1.949679629562811
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Input-Convex Neural Networks (ICNNs) are networks that guarantee convexity in their input-output mapping. These networks have been successfully applied for energy-based modelling, optimal transport problems and learning invariances. The convexity of ICNNs is achieved by using non-decreasing convex activation functions and non-negative weights. Because of these peculiarities, previous initialisation strategies, which implicitly assume centred weights, are not effective for ICNNs. By studying signal propagation through layers with non-negative weights, we are able to derive a principled weight initialisation for ICNNs. Concretely, we generalise signal propagation theory by removing the assumption that weights are sampled from a centred distribution. In a set of experiments, we demonstrate that our principled initialisation effectively accelerates learning in ICNNs and leads to better generalisation. Moreover, we find that, in contrast to common belief, ICNNs can be trained without skip-connections when initialised correctly. Finally, we apply ICNNs to a real-world drug discovery task and show that they allow for more effective molecular latent space exploration.

Related papers

Deep activity propagation via weight initialization in spiking neural networks [10.69085409825724]
Spiking Neural Networks (SNNs) offer bio-inspired advantages such as sparsity and ultra-low power consumption. Deep SNNs process and transmit information by quantizing the real-valued membrane potentials into binary spikes. We show theoretically that, unlike standard approaches, this method enables the propagation of activity in deep SNNs without loss of spikes.
arXiv Detail & Related papers (2024-10-01T11:02:34Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Artificial to Spiking Neural Networks Conversion for Scientific Machine Learning [24.799635365988905]
We introduce a method to convert Physics-Informed Neural Networks (PINNs) to Spiking Neural Networks (SNNs) SNNs are expected to have higher energy efficiency compared to traditional Artificial Neural Networks (ANNs)
arXiv Detail & Related papers (2023-08-31T00:21:27Z)
A Unified Weight Initialization Paradigm for Tensorial Convolutional Neural Networks [17.71332705005499]
Convolutional Neural Networks (TCNNs) have attracted much research attention for their power in reducing model parameters or enhancing the ability. exploration of TCNNs is hindered even from weight cleanup methods. We propose a universal weight cleanup paradigm, which generalizes Xavier and Kaiming methods and can be widely applicable to arbitrary TCNNs. Our paradigm can stabilize the training of TCNNs, leading to faster convergence and better results.
arXiv Detail & Related papers (2022-05-28T13:31:24Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Extended critical regimes of deep neural networks [0.0]
We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We provide a theoretical guide for the design of efficient neural architectures.
arXiv Detail & Related papers (2022-03-24T10:15:50Z)
Reinforcement Learning with External Knowledge by using Logical Neural Networks [67.46162586940905]
A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. We propose an integrated method that enables model-free reinforcement learning from external knowledge sources.
arXiv Detail & Related papers (2021-03-03T12:34:59Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources. For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.