Scalable Partial Explainability in Neural Networks via Flexible
Activation Functions
- URL: http://arxiv.org/abs/2006.06057v1
- Date: Wed, 10 Jun 2020 20:30:15 GMT
- Title: Scalable Partial Explainability in Neural Networks via Flexible
Activation Functions
- Authors: Schyler C. Sun, Chen Li, Zhuangkun Wei, Antonios Tsourdos, Weisi Guo
- Abstract summary: High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms.
Current state-of-the-art NN interpretation methods focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself.
In this paper, we achieve partially explainable learning model by symbolically explaining the role of activation functions (AF) under a scalable topology.
- Score: 13.71739091287644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving transparency in black-box deep learning algorithms is still an open
challenge. High dimensional features and decisions given by deep neural
networks (NN) require new algorithms and methods to expose its mechanisms.
Current state-of-the-art NN interpretation methods (e.g. Saliency maps,
DeepLIFT, LIME, etc.) focus more on the direct relationship between NN outputs
and inputs rather than the NN structure and operations itself. In current deep
NN operations, there is uncertainty over the exact role played by neurons with
fixed activation functions. In this paper, we achieve partially explainable
learning model by symbolically explaining the role of activation functions (AF)
under a scalable topology. This is carried out by modeling the AFs as adaptive
Gaussian Processes (GP), which sit within a novel scalable NN topology, based
on the Kolmogorov-Arnold Superposition Theorem (KST). In this scalable NN
architecture, the AFs are generated by GP interpolation between control points
and can thus be tuned during the back-propagation procedure via gradient
descent. The control points act as the core enabler to both local and global
adjustability of AF, where the GP interpolation constrains the intrinsic
autocorrelation to avoid over-fitting. We show that there exists a trade-off
between the NN's expressive power and interpretation complexity, under linear
KST topology scaling. To demonstrate this, we perform a case study on a binary
classification dataset of banknote authentication. By quantitatively and
qualitatively investigating the mapping relationship between inputs and output,
our explainable model can provide interpretation over each of the
one-dimensional attributes. These early results suggest that our model has the
potential to act as the final interpretation layer for deep neural networks.
Related papers
- Learning local discrete features in explainable-by-design convolutional neural networks [0.0]
We introduce an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism.
The model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections.
By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels.
arXiv Detail & Related papers (2024-10-31T18:39:41Z) - Joint Diffusion Processes as an Inductive Bias in Sheaf Neural Networks [14.224234978509026]
Sheaf Neural Networks (SNNs) naturally extend Graph Neural Networks (GNNs)
We propose two novel sheaf learning approaches that provide a more intuitive understanding of the involved structure maps.
In our evaluation, we show the limitations of the real-world benchmarks used so far on SNNs.
arXiv Detail & Related papers (2024-07-30T07:17:46Z) - Deep Neural Networks via Complex Network Theory: a Perspective [3.1023851130450684]
Deep Neural Networks (DNNs) can be represented as graphs whose links and vertices iteratively process data and solve tasks sub-optimally. Complex Network Theory (CNT), merging statistical physics with graph theory, provides a method for interpreting neural networks by analysing their weights and neuron structures.
In this work, we extend the existing CNT metrics with measures that sample from the DNNs' training distribution, shifting from a purely topological analysis to one that connects with the interpretability of deep learning.
arXiv Detail & Related papers (2024-04-17T08:42:42Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.