Scalable Partial Explainability in Neural Networks via Flexible
Activation Functions
- URL: http://arxiv.org/abs/2006.06057v1
- Date: Wed, 10 Jun 2020 20:30:15 GMT
- Title: Scalable Partial Explainability in Neural Networks via Flexible
Activation Functions
- Authors: Schyler C. Sun, Chen Li, Zhuangkun Wei, Antonios Tsourdos, Weisi Guo
- Abstract summary: High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms.
Current state-of-the-art NN interpretation methods focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself.
In this paper, we achieve partially explainable learning model by symbolically explaining the role of activation functions (AF) under a scalable topology.
- Score: 13.71739091287644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving transparency in black-box deep learning algorithms is still an open
challenge. High dimensional features and decisions given by deep neural
networks (NN) require new algorithms and methods to expose its mechanisms.
Current state-of-the-art NN interpretation methods (e.g. Saliency maps,
DeepLIFT, LIME, etc.) focus more on the direct relationship between NN outputs
and inputs rather than the NN structure and operations itself. In current deep
NN operations, there is uncertainty over the exact role played by neurons with
fixed activation functions. In this paper, we achieve partially explainable
learning model by symbolically explaining the role of activation functions (AF)
under a scalable topology. This is carried out by modeling the AFs as adaptive
Gaussian Processes (GP), which sit within a novel scalable NN topology, based
on the Kolmogorov-Arnold Superposition Theorem (KST). In this scalable NN
architecture, the AFs are generated by GP interpolation between control points
and can thus be tuned during the back-propagation procedure via gradient
descent. The control points act as the core enabler to both local and global
adjustability of AF, where the GP interpolation constrains the intrinsic
autocorrelation to avoid over-fitting. We show that there exists a trade-off
between the NN's expressive power and interpretation complexity, under linear
KST topology scaling. To demonstrate this, we perform a case study on a binary
classification dataset of banknote authentication. By quantitatively and
qualitatively investigating the mapping relationship between inputs and output,
our explainable model can provide interpretation over each of the
one-dimensional attributes. These early results suggest that our model has the
potential to act as the final interpretation layer for deep neural networks.
Related papers
- Deep Neural Networks via Complex Network Theory: a Perspective [3.1023851130450684]
Deep Neural Networks (DNNs) can be represented as graphs whose links and vertices iteratively process data and solve tasks sub-optimally. Complex Network Theory (CNT), merging statistical physics with graph theory, provides a method for interpreting neural networks by analysing their weights and neuron structures.
In this work, we extend the existing CNT metrics with measures that sample from the DNNs' training distribution, shifting from a purely topological analysis to one that connects with the interpretability of deep learning.
arXiv Detail & Related papers (2024-04-17T08:42:42Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - The Influence of Learning Rule on Representation Dynamics in Wide Neural
Networks [18.27510863075184]
We analyze infinite-width deep gradient networks trained with feedback alignment (FA), direct feedback alignment (DFA), and error modulated Hebbian learning (Hebb)
We show that, for each of these learning rules, the evolution of the output function at infinite width is governed by a time varying effective neural tangent kernel (eNTK)
In the lazy training limit, this eNTK is static and does not evolve, while in the rich mean-field regime this kernel's evolution can be determined self-consistently with dynamical mean field theory (DMFT)
arXiv Detail & Related papers (2022-10-05T11:33:40Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Tensor-based framework for training flexible neural networks [9.176056742068813]
We propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem.
The proposed algorithm can handle different bases decomposition.
The goal of this method is to compress large pretrained NN models, by replacing tensorworks, em i.e., one or multiple layers of the original network, by a new flexible layer.
arXiv Detail & Related papers (2021-06-25T10:26:48Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.