Related papers: Gaussian Gated Linear Networks

Gaussian Gated Linear Networks

URL: http://arxiv.org/abs/2006.05964v2
Date: Wed, 21 Oct 2020 16:39:03 GMT
Title: Gaussian Gated Linear Networks
Authors: David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness
Abstract summary: We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks. Instead of using backpropagation to learn features, GLNs have a distributed and local credit assignment mechanism based on optimizing a convex objective.
Score: 32.27304928359326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks. Instead of using backpropagation to learn features, GLNs have a distributed and local credit assignment mechanism based on optimizing a convex objective. This gives rise to many desirable properties including universality, data-efficient online learning, trivial interpretability and robustness to catastrophic forgetting. We extend the GLN framework from classification to multiple regression and density modelling by generalizing geometric mixing to a product of Gaussian densities. The G-GLN achieves competitive or state-of-the-art performance on several univariate and multivariate regression benchmarks, and we demonstrate its applicability to practical tasks including online contextual bandits and density estimation via denoising.

Related papers

On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community. Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z)
A Non-negative VAE:the Generalized Gamma Belief Network [49.970917207211556]
The gamma belief network (GBN) has demonstrated its potential for uncovering multi-layer interpretable latent representations in text data. We introduce the generalized gamma belief network (Generalized GBN) in this paper, which extends the original linear generative model to a more expressive non-linear generative model. We also propose an upward-downward Weibull inference network to approximate the posterior distribution of the latent variables.
arXiv Detail & Related papers (2024-08-06T18:18:37Z)
Exact Gauss-Newton Optimization for Training Deep Neural Networks [0.0]
We present EGN, a second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. We show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm.
arXiv Detail & Related papers (2024-05-23T10:21:05Z)
Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks [2.0072624123275533]
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization.
arXiv Detail & Related papers (2024-04-23T10:02:22Z)
Kernel-based Joint Multiple Graph Learning and Clustering of Graph Signals [2.4305626489408465]
We introduce Kernel-based joint Multiple GL and clustering of graph signals applications. Experiments demonstrate that KMGL significantly enhances the robustness of GL clustering, particularly in scenarios with high noise levels. These findings underscore the potential of KMGL for improving the performance of Graph Signal Processing methods in diverse real-world applications.
arXiv Detail & Related papers (2023-10-29T13:41:12Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology [6.6682038218782065]
We develop a general framework unifying several gradient-based optimization methods for empirical risk minimization problems. We provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA. The rate results reveal that VR and GT methods can effectively eliminate data within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution.
arXiv Detail & Related papers (2022-07-08T07:50:08Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Learning Graph Neural Networks for Multivariate Time Series Anomaly Detection [8.688578727646409]
We propose GLUE (Graph Deviation Network with Local Uncertainty Estimation) GLUE learns complex dependencies between variables and uses them to better identify anomalous behavior. We also show that GLUE learns meaningful sensor embeddings which clusters similar sensors together.
arXiv Detail & Related papers (2021-11-15T21:05:58Z)
Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice. We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.