Identifying Critical Neurons in ANN Architectures using Mixed Integer
Programming
- URL: http://arxiv.org/abs/2002.07259v4
- Date: Mon, 7 Sep 2020 16:39:50 GMT
- Title: Identifying Critical Neurons in ANN Architectures using Mixed Integer
Programming
- Authors: Mostafa ElAraby, Guy Wolf, Margarida Carvalho
- Abstract summary: We introduce a mixed integer program (MIP) for assigning importance scores to each neuron in deep neural network architectures.
We drive the solver to minimize the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the trained neural network.
- Score: 11.712073757744452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a mixed integer program (MIP) for assigning importance scores to
each neuron in deep neural network architectures which is guided by the impact
of their simultaneous pruning on the main learning task of the network. By
carefully devising the objective function of the MIP, we drive the solver to
minimize the number of critical neurons (i.e., with high importance score) that
need to be kept for maintaining the overall accuracy of the trained neural
network. Further, the proposed formulation generalizes the recently considered
lottery ticket optimization by identifying multiple "lucky" sub-networks
resulting in optimized architecture that not only performs well on a single
dataset, but also generalizes across multiple ones upon retraining of network
weights. Finally, we present a scalable implementation of our method by
decoupling the importance scores across layers using auxiliary networks. We
demonstrate the ability of our formulation to prune neural networks with
marginal loss in accuracy and generalizability on popular datasets and
architectures.
Related papers
- Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN)
The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability.
We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z) - Predicting Infant Brain Connectivity with Federated Multi-Trajectory
GNNs using Scarce Data [54.55126643084341]
Existing deep learning solutions suffer from three major limitations.
We introduce FedGmTE-Net++, a federated graph-based multi-trajectory evolution network.
Using the power of federation, we aggregate local learnings among diverse hospitals with limited datasets.
arXiv Detail & Related papers (2024-01-01T10:20:01Z) - Graph Metanetworks for Processing Diverse Neural Architectures [33.686728709734105]
Graph Metanetworks (GMNs) generalizes to neural architectures where competing methods struggle.
We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions.
arXiv Detail & Related papers (2023-12-07T18:21:52Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - Critical Initialization of Wide and Deep Neural Networks through Partial
Jacobians: General Theory and Applications [6.579523168465526]
We introduce emphpartial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0leq l$.
We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.
arXiv Detail & Related papers (2021-11-23T20:31:42Z) - AutoDEUQ: Automated Deep Ensemble with Uncertainty Quantification [0.9449650062296824]
We propose AutoDEUQ, an automated approach for generating an ensemble of deep neural networks.
We show that AutoDEUQ outperforms probabilistic backpropagation, Monte Carlo dropout, deep ensemble, distribution-free ensembles, and hyper ensemble methods on a number of regression benchmarks.
arXiv Detail & Related papers (2021-10-26T09:12:23Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Max and Coincidence Neurons in Neural Networks [0.07614628596146598]
We optimize networks containing models of the max and coincidence neurons using neural architecture search.
We analyze the structure, operations, and neurons of optimized networks to develop a signal-processing ResNet.
The developed network achieves an average of 2% improvement in accuracy and a 25% improvement in network size across a variety of datasets.
arXiv Detail & Related papers (2021-10-04T07:13:50Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.