Related papers: ReLU Characteristic Activation Analysis

ReLU Characteristic Activation Analysis

URL: http://arxiv.org/abs/2305.15912v4
Date: Tue, 21 May 2024 21:08:06 GMT
Title: ReLU Characteristic Activation Analysis
Authors: Wenlin Chen, Hong Ge,
Abstract summary: We introduce a novel approach for analyzing the training dynamics of ReLU networks by examining the characteristic activation boundaries of individual neurons. Our proposed analysis reveals a critical instability in common neural network parameterizations and normalizations during convergence optimization, which impedes fast convergence and hurts performance.
Score: 2.2713084727838115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a novel approach for analyzing the training dynamics of ReLU networks by examining the characteristic activation boundaries of individual ReLU neurons. Our proposed analysis reveals a critical instability in common neural network parameterizations and normalizations during stochastic optimization, which impedes fast convergence and hurts generalization performance. Addressing this, we propose Geometric Parameterization (GmP), a novel neural network parameterization technique that effectively separates the radial and angular components of weights in the hyperspherical coordinate system. We show theoretically that GmP resolves the aforementioned instability issue. We report empirical results on various models and benchmarks to verify GmP's theoretical advantages of optimization stability, convergence speed and generalization performance.

Related papers

Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality [52.906438147288256]
We show that our algorithm can identify the globally optimal reward and policy under certain neural network structures. This is the first IRL algorithm with a non-asymptotic convergence guarantee that provably achieves global optimality.
arXiv Detail & Related papers (2025-03-22T21:16:08Z)
On the Convergence Analysis of Over-Parameterized Variational Autoencoders: A Neural Tangent Kernel Perspective [7.580900499231056]
Variational Auto-Encoders (VAEs) have emerged as powerful probabilistic models for generative tasks. This paper provides a mathematical proof of VAE under mild assumptions. We also establish a novel connection between the optimization problem faced by over-Eized SNNs and the Kernel Ridge (KRR) problem.
arXiv Detail & Related papers (2024-09-09T06:10:31Z)
Advancing Spatio-Temporal Processing in Spiking Neural Networks through Adaptation [6.233189707488025]
neural networks on neuromorphic hardware promise orders of less power consumption than their non-spiking counterparts. Standard neuron model for spike-based computation on such systems has long been the integrate-and-fire (LIF) neuron. The root of these so-called adaptive LIF neurons is not well understood.
arXiv Detail & Related papers (2024-08-14T12:49:58Z)
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z)
Neural Parameter Regression for Explicit Representations of PDE Solution Operators [22.355460388065964]
We introduce Neural Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs) NPR employs Physics-Informed Neural Network (PINN, Raissi et al., 2021) techniques to regress Neural Network (NN) parameters. The framework shows remarkable adaptability to new initial and boundary conditions, allowing for rapid fine-tuning and inference.
arXiv Detail & Related papers (2024-03-19T14:30:56Z)
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters. We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
Orthogonal Stochastic Configuration Networks with Adaptive Construction Parameter for Data Analytics [6.940097162264939]
randomness makes SCNs more likely to generate approximate linear correlative nodes that are redundant and low quality. In light of a fundamental principle in machine learning, that is, a model with fewer parameters holds improved generalization. This paper proposes orthogonal SCN, termed OSCN, to filtrate out the low-quality hidden nodes for network structure reduction.
arXiv Detail & Related papers (2022-05-26T07:07:26Z)
Improving Parametric Neural Networks for High-Energy Physics (and Beyond) [0.0]
We aim at deepening the understanding of Parametric Neural Network (pNN) networks in light of real-world usage. We propose an alternative parametrization scheme, resulting in a new parametrized neural network architecture: the AffinePNN. We extensively evaluate our models on the HEPMASS dataset, along its imbalanced version (called HEPMASS-IMB)
arXiv Detail & Related papers (2022-02-01T14:18:43Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy [119.12515258771302]
We show that a variant of PPOO equipped with over-parametrization converges to globally optimal networks. The key to our analysis is the iterate of infinite gradient under a notion of one-dimensional monotonicity, where the gradient and are instant by networks.
arXiv Detail & Related papers (2019-06-25T03:20:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.