Related papers: Intraclass clustering: an implicit learning ability that regularizes DNNs

Intraclass clustering: an implicit learning ability that regularizes DNNs

URL: http://arxiv.org/abs/2103.06733v1
Date: Thu, 11 Mar 2021 15:26:27 GMT
Title: Intraclass clustering: an implicit learning ability that regularizes DNNs
Authors: Carbonnelle Simon and Christophe De Vleeschouwer
Abstract summary: We show that deep neural networks are regularized through their ability to extract meaningful clusters among a class. Measures of intraclass clustering are designed based on the neuron- and layer-level representations of the training data.
Score: 22.732204569029648
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Several works have shown that the regularization mechanisms underlying deep neural networks' generalization performances are still poorly understood. In this paper, we hypothesize that deep neural networks are regularized through their ability to extract meaningful clusters among the samples of a class. This constitutes an implicit form of regularization, as no explicit training mechanisms or supervision target such behaviour. To support our hypothesis, we design four different measures of intraclass clustering, based on the neuron- and layer-level representations of the training data. We then show that these measures constitute accurate predictors of generalization performance across variations of a large set of hyperparameters (learning rate, batch size, optimizer, weight decay, dropout rate, data augmentation, network depth and width).

Related papers

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks [13.983863226803336]
We argue that "Feature Averaging" is one of the principal factors contributing to non-robustness of deep neural networks. We provide a detailed theoretical analysis of the training dynamics of gradient descent in a two-layer ReLU network for a binary classification task. We prove that, with the provision of more granular supervised information, a two-layer multi-class neural network is capable of learning individual features.
arXiv Detail & Related papers (2024-10-14T09:28:32Z)
Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions. We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z)
Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation [12.91586050451152]
Spectral clustering is a theoretically grounded solution to it where the spectral embeddings for pixels are computed to construct distinct clusters. Current approaches still suffer from inefficiencies in spectral decomposition and inflexibility in applying them to the test data. This work addresses these issues by casting spectral clustering as a parametric approach that employs neural network-based eigenfunctions to produce spectral embeddings. In practice, the neural eigenfunctions are lightweight and take the features from pre-trained models as inputs, improving training efficiency and unleashing the potential of pre-trained models for dense prediction.
arXiv Detail & Related papers (2023-04-06T03:14:15Z)
Initial Study into Application of Feature Density and Linguistically-backed Embedding to Improve Machine Learning-based Cyberbullying Detection [54.83707803301847]
The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection. The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
arXiv Detail & Related papers (2022-06-04T03:17:15Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning. Early layers generally learn representations relevant to performance on both training data and testing data. Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z)
Efficient and Robust Classification for Sparse Attacks [34.48667992227529]
We consider perturbations bounded by the $ell$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection. We propose a novel defense method that consists of "truncation" and "adrial training" Motivated by the insights we obtain, we extend these components to neural network classifiers.
arXiv Detail & Related papers (2022-01-23T21:18:17Z)
Predicting Deep Neural Network Generalization with Perturbation Response Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks. Specifically, we introduce two new measures for accurately predicting generalization gaps. We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z)
Analyzing Overfitting under Class Imbalance in Neural Networks for Image Segmentation [19.259574003403998]
In image segmentation neural networks may overfit to the foreground samples from small structures. In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior.
arXiv Detail & Related papers (2021-02-20T14:57:58Z)
Regularizing Class-wise Predictions via Self-knowledge Distillation [80.76254453115766]
We propose a new regularization method that penalizes the predictive distribution between similar samples. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve the generalization ability.
arXiv Detail & Related papers (2020-03-31T06:03:51Z)
Topologically Densified Distributions [25.140319008330167]
We study regularization in the context of small sample-size learning with over- parameterized neural networks. We impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances.
arXiv Detail & Related papers (2020-02-12T05:25:15Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.