Related papers: On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity

On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity

URL: http://arxiv.org/abs/2205.12642v1
Date: Wed, 25 May 2022 10:38:33 GMT
Title: On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity
Authors: Vincent Szolnoky, Viktor Andersson, Balazs Kulcsar, Rebecka J\"ornsten
Abstract summary: Model Gradient Similarity (MGS) serves as a metric of regularisation. MGS provides the basis for a new regularisation scheme which exhibits excellent performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most complex machine learning and modelling techniques are prone to over-fitting and may subsequently generalise poorly to future data. Artificial neural networks are no different in this regard and, despite having a level of implicit regularisation when trained with gradient descent, often require the aid of explicit regularisers. We introduce a new framework, Model Gradient Similarity (MGS), that (1) serves as a metric of regularisation, which can be used to monitor neural network training, (2) adds insight into how explicit regularisers, while derived from widely different principles, operate via the same mechanism underneath by increasing MGS, and (3) provides the basis for a new regularisation scheme which exhibits excellent performance, especially in challenging settings such as high levels of label noise or limited sample sizes.

Related papers

Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization [23.78498670529746]
We introduce a regularization technique to ensure that the magnitudes of the extracted features are evenly distributed. Despite its apparent simplicity, our approach has demonstrated significant performance improvements across various fine-grained visual recognition datasets.
arXiv Detail & Related papers (2024-09-03T07:32:46Z)
Rule Based Learning with Dynamic (Graph) Neural Networks [0.8158530638728501]
We present rule based graph neural networks (RuleGNNs) that overcome some limitations of ordinary graph neural networks. Our experiments show that the predictive performance of RuleGNNs is comparable to state-of-the-art graph classifiers. We introduce new synthetic benchmark graph datasets to show how to integrate expert knowledge into RuleGNNs.
arXiv Detail & Related papers (2024-06-14T12:01:18Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
Monotonic Neural Additive Models: Pursuing Regulated Machine Learning Models for Credit Scoring [1.90365714903665]
We introduce a novel class of monotonic neural additive models, which meet regulatory requirements by simplifying neural network architecture and enforcing monotonicity. Our new model is as accurate as black-box fully-connected neural networks, providing a highly accurate and regulated machine learning method.
arXiv Detail & Related papers (2022-09-21T02:14:09Z)
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications [13.823089111538128]
We present new and tighter information-theoretic upper bounds for the generalization error of machine learning models, such as neural networks, trained with SGD. Experimental study based on these bounds provide some insights on the SGD training of neural networks.
arXiv Detail & Related papers (2021-10-07T00:53:33Z)
Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks. In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings. We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z)
Certified Monotonic Neural Networks [15.537695725617576]
We propose to certify the monotonicity of the general piece-wise linear neural networks by solving a mixed integer linear programming problem. Our approach does not require human-designed constraints on the weight space and also yields more accurate approximation.
arXiv Detail & Related papers (2020-11-20T04:58:13Z)
Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)
Interpretable Learning-to-Rank with Generalized Additive Models [78.42800966500374]
Interpretability of learning-to-rank models is a crucial yet relatively under-examined research area. Recent progress on interpretable ranking models largely focuses on generating post-hoc explanations for existing black-box ranking models. We lay the groundwork for intrinsically interpretable learning-to-rank by introducing generalized additive models (GAMs) into ranking tasks.
arXiv Detail & Related papers (2020-05-06T01:51:30Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.