Related papers: Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

URL: http://arxiv.org/abs/2405.03712v1
Date: Sat, 4 May 2024 11:22:30 GMT
Title: Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition
Authors: Xiaoyan Su, Yinghao Zhu, Run Li,
Abstract summary: We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. We have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy.
Score: 0.994853090657971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.

Related papers

Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs) We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions. PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Unification of popular artificial neural network activation functions [0.0]
We present a unified representation of the most popular neural network activation functions. Adopting Mittag-Leffler functions of fractional calculus, we propose a flexible and compact functional form.
arXiv Detail & Related papers (2023-02-21T21:20:59Z)
Data-aware customization of activation functions reduces neural network error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error. A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z)
Compressing Deep ODE-Nets using Basis Function Expansions [105.05435207079759]
We consider formulations of the weights as continuous-depth functions using linear combinations of basis functions. This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance. In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments.
arXiv Detail & Related papers (2021-06-21T03:04:51Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
Activation functions are not needed: the ratio net [3.9636371287541086]
This paper focus on designing a new function approximator. Instead of designing new activation functions or kernel functions, the new proposed network uses the fractional form. It shows that, in most cases, the ratio net converges faster and outperforms both the classification and the RBF.
arXiv Detail & Related papers (2020-05-14T01:07:56Z)
Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z)
Investigating the interaction between gradient-only line searches and different activation functions [0.0]
Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions in neural network training. We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures.
arXiv Detail & Related papers (2020-02-23T12:28:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.