Related papers: Scaling Symbolic Methods using Gradients for Neural Model Explanation

Scaling Symbolic Methods using Gradients for Neural Model Explanation

URL: http://arxiv.org/abs/2006.16322v4
Date: Wed, 5 May 2021 14:13:39 GMT
Title: Scaling Symbolic Methods using Gradients for Neural Model Explanation
Authors: Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley
Abstract summary: We propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses. In particular, we apply this technique to identify minimal regions in an input that are most relevant for a neural network's prediction. We evaluate our technique on three datasets - MNIST, ImageNet, and Beer Reviews.
Score: 22.568591780291776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for model explanation. In particular, we apply this technique to identify minimal regions in an input that are most relevant for a neural network's prediction. Our approach uses gradient information (based on Integrated Gradients) to focus on a subset of neurons in the first layer, which allows our technique to scale to large networks. The corresponding SMT constraints encode the minimal input mask discovery problem such that after masking the input, the activations of the selected neurons are still above a threshold. After solving for the minimal masks, our approach scores the mask regions to generate a relative ordering of the features within the mask. This produces a saliency map which explains "where a model is looking" when making a prediction. We evaluate our technique on three datasets - MNIST, ImageNet, and Beer Reviews, and demonstrate both quantitatively and qualitatively that the regions generated by our approach are sparser and achieve higher saliency scores compared to the gradient-based methods alone. Code and examples are at - https://github.com/google-research/google-research/tree/master/smug_saliency

Related papers

GLL: A Differentiable Graph Learning Layer for Neural Networks [8.149825561954607]
Graph-based learning techniques, namely Laplace learning, have been combined with neural networks for both supervised and semi-supervised learning (SSL) tasks. In this work, we derive backpropagation equations, via the adjoint method, for inclusion of a general family of graph learning layers into a neural network. This allows us to precisely integrate graph Laplacian-based label propagation into a neural network layer, replacing a projection head and softmax activation function for classification tasks.
arXiv Detail & Related papers (2024-12-11T01:54:29Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
NeuralGF: Unsupervised Point Normal Estimation by Learning Neural Gradient Function [55.86697795177619]
Normal estimation for 3D point clouds is a fundamental task in 3D geometry processing. We introduce a new paradigm for learning neural gradient functions, which encourages the neural network to fit the input point clouds. Our excellent results on widely used benchmarks demonstrate that our method can learn more accurate normals for both unoriented and oriented normal estimation tasks.
arXiv Detail & Related papers (2023-11-01T09:25:29Z)
Generalizable Neural Fields as Partially Observed Neural Processes [16.202109517569145]
We propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework. We demonstrate that this approach outperforms both state-of-the-art gradient-based meta-learning approaches and hypernetwork approaches.
arXiv Detail & Related papers (2023-09-13T01:22:16Z)
[Experiments & Analysis] Evaluating the Feasibility of Sampling-Based Techniques for Training Multilayer Perceptrons [10.145355763143218]
Several sampling-based techniques have been proposed for speeding up the training time of deep neural networks. These techniques fall under two categories: (i) sampling a subset of nodes in every hidden layer as active at every iteration and (ii) sampling a subset of nodes from the previous layer to approximate the current layer's activations. In this paper, we evaluate the feasibility of these approaches on CPU machines with limited computational resources.
arXiv Detail & Related papers (2023-06-15T17:19:48Z)
Neural-prior stochastic block model [0.0]
We propose to model the communities as being determined by the node attributes rather than the opposite. We propose an algorithm, stemming from statistical physics, based on a combination of belief propagation and approximate message passing. The proposed model and algorithm can be used as a benchmark for both theory and algorithms.
arXiv Detail & Related papers (2023-03-17T14:14:54Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Visual Explanations from Deep Networks via Riemann-Stieltjes Integrated Gradient-based Localization [0.24596929878045565]
We introduce a new technique to produce visual explanations for the predictions of a CNN. Our method can be applied to any layer of the network, and like Integrated Gradients it is not affected by the problem of vanishing gradients. Compared to Grad-CAM, heatmaps produced by our algorithm are better focused in the areas of interest, and their numerical computation is more stable.
arXiv Detail & Related papers (2022-05-22T18:30:38Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
A Local Geometric Interpretation of Feature Extraction in Deep Feedforward Neural Networks [13.159994710917022]
In this paper, we present a local geometric analysis to interpret how deep feedforward neural networks extract low-dimensional features from high-dimensional data. Our study shows that, in a local geometric region, the optimal weight in one layer of the neural network and the optimal feature generated by the previous layer comprise a low-rank approximation of a matrix that is determined by the Bayes action of this layer.
arXiv Detail & Related papers (2022-02-09T18:50:00Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.