Related papers: Is network fragmentation a useful complexity measure?

Is network fragmentation a useful complexity measure?

URL: http://arxiv.org/abs/2411.04695v1
Date: Thu, 07 Nov 2024 13:27:37 GMT
Title: Is network fragmentation a useful complexity measure?
Authors: Coenraad Mouton, Randle Rabe, Daniël G. Haasbroek, Marthinus W. Theunissen, Hermanus L. Potgieter, Marelie H. Davel,
Abstract summary: Deep neural network classifiers can exhibit fragmentation', where the model function rapidly changes class as the input space is traversed. We study this phenomenon in the context of image classification and ask whether fragmentation could be predictive of generalization performance. We report on new observations related to fragmentation, namely (i) fragmentation is not limited to the input space but occurs in the hidden representations as well, (ii) fragmentation follows the trends in the validation error throughout training, and (iii) fragmentation is not a direct result of increased weight norms.
Score: 0.8480931990442769
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It has been observed that the input space of deep neural network classifiers can exhibit `fragmentation', where the model function rapidly changes class as the input space is traversed. The severity of this fragmentation tends to follow the double descent curve, achieving a maximum at the interpolation regime. We study this phenomenon in the context of image classification and ask whether fragmentation could be predictive of generalization performance. Using a fragmentation-based complexity measure, we show this to be possible by achieving good performance on the PGDL (Predicting Generalization in Deep Learning) benchmark. In addition, we report on new observations related to fragmentation, namely (i) fragmentation is not limited to the input space but occurs in the hidden representations as well, (ii) fragmentation follows the trends in the validation error throughout training, and (iii) fragmentation is not a direct result of increased weight norms. Together, this indicates that fragmentation is a phenomenon worth investigating further when studying the generalization ability of deep neural networks.

Related papers

The Space Between: On Folding, Symmetries and Sampling [4.16445550760248]
We propose a space folding measure based on Hamming distance in the ReLU activation space. We show that space folding values increase with network depth when the generalization error is low, but decrease when the error increases. Inspired by these findings, we outline a novel regularization scheme that encourages the network to seek solutions characterized by higher folding values.
arXiv Detail & Related papers (2025-03-11T14:54:25Z)
Simplicity bias and optimization threshold in two-layer ReLU networks [24.43739371803548]
We show that despite overparametrization, networks converge toward simpler solutions rather than interpolating the training data. Our analysis relies on the so called early alignment phase, during which neurons align towards specific directions.
arXiv Detail & Related papers (2024-10-03T09:58:57Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
AttenScribble: Attentive Similarity Learning for Scribble-Supervised Medical Image Segmentation [5.8447004333496855]
In this paper, we present a straightforward yet effective scribble supervised learning framework. We create a pluggable spatial self-attention module which could be attached on top of any internal feature layers of arbitrary fully convolutional network (FCN) backbone. This attentive similarity leads to a novel regularization loss that imposes consistency between segmentation prediction and visual affinity.
arXiv Detail & Related papers (2023-12-11T18:42:18Z)
From Tempered to Benign Overfitting in ReLU Neural Networks [41.271773069796126]
Overtrivialized neural networks (NNs) are observed to generalize well even when trained to perfectly fit noisy data. Recently, it was conjectured and empirically observed that the behavior of NNs is often better described as "tempered overfitting"
arXiv Detail & Related papers (2023-05-24T13:36:06Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Flip Learning: Erase to Segment [65.84901344260277]
Weakly-supervised segmentation (WSS) can help reduce time-consuming and cumbersome manual annotation. We propose a novel and general WSS framework called Flip Learning, which only needs the box annotation. Our proposed approach achieves competitive performance and shows great potential to narrow the gap between fully-supervised and weakly-supervised learning.
arXiv Detail & Related papers (2021-08-02T09:56:10Z)
Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers. We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z)
Extreme Memorization via Scale of Initialization [72.78162454173803]
We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD. We find that the extent and manner in which generalization ability is affected depends on the activation and loss function used. In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.
arXiv Detail & Related papers (2020-08-31T04:53:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.