Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language
Understanding
- URL: http://arxiv.org/abs/2301.03765v2
- Date: Sat, 9 Mar 2024 07:12:52 GMT
- Title: Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language
Understanding
- Authors: Yunchang Zhu, Liang Pang, Kangxi Wu, Yanyan Lan, Huawei Shen, Xueqi
Cheng
- Abstract summary: We propose a cross-model comparative loss for a broad range of tasks.
We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks.
- Score: 82.46024259137823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current natural language understanding (NLU) models have been continuously
scaling up, both in terms of model size and input context, introducing more
hidden and input neurons. While this generally improves performance on average,
the extra neurons do not yield a consistent improvement for all instances. This
is because some hidden neurons are redundant, and the noise mixed in input
neurons tends to distract the model. Previous work mainly focuses on
extrinsically reducing low-utility neurons by additional post- or
pre-processing, such as network pruning and context selection, to avoid this
problem. Beyond that, can we make the model reduce redundant parameters and
suppress input noise by intrinsically enhancing the utility of each neuron? If
a model can efficiently utilize neurons, no matter which neurons are ablated
(disabled), the ablated submodel should perform no better than the original
full model. Based on such a comparison principle between models, we propose a
cross-model comparative loss for a broad range of tasks. Comparative loss is
essentially a ranking loss on top of the task-specific losses of the full and
ablated models, with the expectation that the task-specific loss of the full
model is minimal. We demonstrate the universal effectiveness of comparative
loss through extensive experiments on 14 datasets from 3 distinct NLU tasks
based on 5 widely used pretrained language models and find it particularly
superior for models with few parameters or long input.
Related papers
- Magnificent Minified Models [0.360953887026184]
This paper concerns itself with the task of taking a large trained neural network and 'compressing' it to be smaller by deleting parameters or entire neurons.
We compare various methods of parameter and neuron selection: dropout-based neuron damage estimation, neuron merging, absolute-value based selection, random selection.
For neuron-level pruning, retraining from scratch did much better in our experiments.
arXiv Detail & Related papers (2023-06-16T21:00:44Z) - Inferring Population Dynamics in Macaque Cortex [0.0]
We show that simple, general-purpose architectures based on recurrent neural networks (RNNs) outperform more "bespoke" models.
We argue that the autoregressive bias imposed by RNNs is critical for achieving the highest levels of performance.
arXiv Detail & Related papers (2023-04-05T14:24:27Z) - Neural Additive Models for Location Scale and Shape: A Framework for
Interpretable Neural Regression Beyond the Mean [1.0923877073891446]
Deep neural networks (DNNs) have proven to be highly effective in a variety of tasks.
Despite this success, the inner workings of DNNs are often not transparent.
This lack of interpretability has led to increased research on inherently interpretable neural networks.
arXiv Detail & Related papers (2023-01-27T17:06:13Z) - Supervised Parameter Estimation of Neuron Populations from Multiple
Firing Events [3.2826301276626273]
We study an automatic approach of learning the parameters of neuron populations from a training set consisting of pairs of spiking series and parameter labels via supervised learning.
We simulate many neuronal populations at computation at different parameter settings using a neuron model.
We then compare their performance against classical approaches including a genetic search, Bayesian sequential estimation, and a random walk approximate model.
arXiv Detail & Related papers (2022-10-02T03:17:05Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z) - Investigation and Analysis of Hyper and Hypo neuron pruning to
selectively update neurons during Unsupervised Adaptation [8.845660219190298]
Pruning approaches look for low-salient neurons that are less contributive to a model's decision.
This work investigates if pruning approaches are successful in detecting neurons that are either high-salient (mostly active or hyper) or low-salient (barely active or hypo)
It shows that it may be possible to selectively adapt certain neurons (consisting of the hyper and the hypo neurons) first, followed by a full-network fine tuning.
arXiv Detail & Related papers (2020-01-06T19:46:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.