Related papers: Magnificent Minified Models

Magnificent Minified Models

URL: http://arxiv.org/abs/2306.10177v1
Date: Fri, 16 Jun 2023 21:00:44 GMT
Title: Magnificent Minified Models
Authors: Rich Harang and Hillary Sanders
Abstract summary: This paper concerns itself with the task of taking a large trained neural network and 'compressing' it to be smaller by deleting parameters or entire neurons. We compare various methods of parameter and neuron selection: dropout-based neuron damage estimation, neuron merging, absolute-value based selection, random selection. For neuron-level pruning, retraining from scratch did much better in our experiments.
Score: 0.360953887026184
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper concerns itself with the task of taking a large trained neural network and 'compressing' it to be smaller by deleting parameters or entire neurons, with minimal decreases in the resulting model accuracy. We compare various methods of parameter and neuron selection: dropout-based neuron damage estimation, neuron merging, absolute-value based selection, random selection, OBD (Optimal Brain Damage). We also compare a variation on the classic OBD method that slightly outperformed all other parameter and neuron selection methods in our tests with substantial pruning, which we call OBD-SD. We compare these methods against quantization of parameters. We also compare these techniques (all applied to a trained neural network), with neural networks trained from scratch (random weight initialization) on various pruned architectures. Our results are only barely consistent with the Lottery Ticket Hypothesis, in that fine-tuning a parameter-pruned model does slightly better than retraining a similarly pruned model from scratch with randomly initialized weights. For neuron-level pruning, retraining from scratch did much better in our experiments.

Related papers

Revisiting Large Language Model Pruning using Neuron Semantic Attribution [63.62836612864512]
We conduct evaluations on 24 datasets and 4 tasks using popular pruning methods. We surprisingly find a significant performance drop of existing pruning methods in sentiment classification tasks. We propose Neuron Semantic Attribution, which learns to associate each neuron with specific semantics.
arXiv Detail & Related papers (2025-03-03T13:52:17Z)
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model [43.107778640669544]
Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles. Recent studies have revealed that not all neurons are active across different datasets. We introduce Neuron-Level Fine-Tuning (NeFT), a novel approach that refines the granularity of parameter training down to the individual neuron.
arXiv Detail & Related papers (2024-03-18T09:55:01Z)
Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding [82.46024259137823]
We propose a cross-model comparative loss for a broad range of tasks. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks.
arXiv Detail & Related papers (2023-01-10T03:04:27Z)
Supervised Parameter Estimation of Neuron Populations from Multiple Firing Events [3.2826301276626273]
We study an automatic approach of learning the parameters of neuron populations from a training set consisting of pairs of spiking series and parameter labels via supervised learning. We simulate many neuronal populations at computation at different parameter settings using a neuron model. We then compare their performance against classical approaches including a genetic search, Bayesian sequential estimation, and a random walk approximate model.
arXiv Detail & Related papers (2022-10-02T03:17:05Z)
Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation [18.224344440110862]
The proposed algorithm directs the parameters of the compressed model toward a flatter solution by exploring the spectral radius of Hessian. Our result shows that it improves the state-of-the-art results on neuron compression. The method is able to achieve very small networks with small accuracy across different neural network models.
arXiv Detail & Related papers (2021-11-16T15:55:59Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
Meta-Solver for Neural Ordinary Differential Equations [77.8918415523446]
We investigate how the variability in solvers' space can improve neural ODEs performance. We show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
arXiv Detail & Related papers (2021-03-15T17:26:34Z)
The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain. In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z)
Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
Investigation and Analysis of Hyper and Hypo neuron pruning to selectively update neurons during Unsupervised Adaptation [8.845660219190298]
Pruning approaches look for low-salient neurons that are less contributive to a model's decision. This work investigates if pruning approaches are successful in detecting neurons that are either high-salient (mostly active or hyper) or low-salient (barely active or hypo) It shows that it may be possible to selectively adapt certain neurons (consisting of the hyper and the hypo neurons) first, followed by a full-network fine tuning.
arXiv Detail & Related papers (2020-01-06T19:46:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.