Related papers: Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting

Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting

URL: http://arxiv.org/abs/2404.07083v2
Date: Thu, 11 Apr 2024 14:21:32 GMT
Title: Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting
Authors: Nathaniel Dean, Dilip Sarkar,
Abstract summary: We develop multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance. We implement the terms of the Chebyshev Prototype Risk (CPR) bound into our Explicit CPR loss function. Our training algorithm reduces overfitting and improves upon previous approaches in many settings.
Score: 1.6574413179773757
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Overparameterized deep neural networks (DNNs), if not sufficiently regularized, are susceptible to overfitting their training examples and not generalizing well to test data. To discourage overfitting, researchers have developed multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance in one or more layers of the network. By analyzing the penultimate feature layer activations output by a DNN's feature extraction section prior to the linear classifier, we find that modified forms of the intra-class feature covariance and inter-class prototype separation are key components of a fundamental Chebyshev upper bound on the probability of misclassification, which we designate the Chebyshev Prototype Risk (CPR). While previous approaches' covariance loss terms scale quadratically with the number of network features, our CPR bound indicates that an approximate covariance loss in log-linear time is sufficient to reduce the bound and is scalable to large architectures. We implement the terms of the CPR bound into our Explicit CPR (exCPR) loss function and observe from empirical results on multiple datasets and network architectures that our training algorithm reduces overfitting and improves upon previous approaches in many settings. Our code is available at https://github.com/Deano1718/Regularization_exCPR .

Related papers

Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting [9.141594510823799]
A powerful approach to combat spurious correlations is to re-train the last layer on a balanced validation dataset. Key attributes can sometimes be discarded by neural networks towards the last layer. In this work, we consider retraining a classifier on a set of features derived from all layers.
arXiv Detail & Related papers (2024-09-23T00:31:39Z)
On Sequential Loss Approximation for Continual Learning [0.0]
We introduce for continual learning Autodiff Quadratic Consolidation (AQC) and Neural Consolidation (NC) AQC approximates the previous loss function with a quadratic function, and NC approximates the previous loss function with a neural network. We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results.
arXiv Detail & Related papers (2024-05-26T09:20:47Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties. We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z)
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function. We study the impact of the location of the collocation points on the trainability of these models. We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z)
Voxelmorph++ Going beyond the cranial vault with keypoint supervision and multi-channel instance optimisation [8.88841928746097]
Recent Learn2Reg benchmark shows single-scale U-Net architectures fall short of state-of-the-art performance for abdominal or intra-patient lung registration. Here, we propose two straightforward steps that greatly reduce this gap in accuracy. First, we employ keypoint self-supervision with a novel network head that predicts a discretised heatmap. Second, we replace multiple learned fine-tuning steps by a single instance with hand-crafted features and the Adam optimiser.
arXiv Detail & Related papers (2022-02-28T19:23:29Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Probabilistic partition of unity networks: clustering based deep approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss. We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
$\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance. Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z)
A Partial Regularization Method for Network Compression [0.0]
We propose an approach of partial regularization rather than the original form of penalizing all parameters, which is said to be full regularization, to conduct model compression at a higher speed. Experimental results show that as we expected, the computational complexity is reduced by observing less running time in almost all situations. Surprisingly, it helps to improve some important metrics such as regression fitting results and classification accuracy in both training and test phases on multiple datasets.
arXiv Detail & Related papers (2020-09-03T00:38:27Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.