Related papers: Prevention is Better than Cure: Handling Basis Collapse and Transparency in Dense Networks

Prevention is Better than Cure: Handling Basis Collapse and Transparency in Dense Networks

URL: http://arxiv.org/abs/2008.09878v1
Date: Sat, 22 Aug 2020 17:09:54 GMT
Title: Prevention is Better than Cure: Handling Basis Collapse and Transparency in Dense Networks
Authors: Gurpreet Singh, Soumyajit Gupta, Clint N. Dawson
Abstract summary: We identify a basis collapse issue as a primary cause and propose a modified loss function that circumvents this problem. We demonstrate through carefully chosen numerical experiments that the basis collapse issue leads to the design of massively redundant networks. Our approach results in substantially concise nets, having $100 times$ fewer parameters, while achieving a much lower $(10times)$ MSE loss at scale than reported in prior works.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dense nets are an integral part of any classification and regression problem. Recently, these networks have found a new application as solvers for known representations in various domains. However, one crucial issue with dense nets is it's feature interpretation and lack of reproducibility over multiple training runs. In this work, we identify a basis collapse issue as a primary cause and propose a modified loss function that circumvents this problem. We also provide a few general guidelines relating the choice of activations to loss surface roughness and appropriate scaling for designing low-weight dense nets. We demonstrate through carefully chosen numerical experiments that the basis collapse issue leads to the design of massively redundant networks. Our approach results in substantially concise nets, having $100 \times$ fewer parameters, while achieving a much lower $(10\times)$ MSE loss at scale than reported in prior works. Further, we show that the width of a dense net is acutely dependent on the feature complexity. This is in contrast to the dimension dependent width choice reported in prior theoretical works. To the best of our knowledge, this is the first time these issues and contradictions have been reported and experimentally verified. With our design guidelines we render transparency in terms of a low-weight network design. We share our codes for full reproducibility available at https://github.com/smjtgupta/Dense_Net_Regress.

Related papers

Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks [15.385743143648574]
We propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to sparsest networks.<n>We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions.
arXiv Detail & Related papers (2025-05-27T21:46:27Z)
Network reconstruction via the minimum description length principle [0.0]
We propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data. We demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks.
arXiv Detail & Related papers (2024-05-02T05:35:09Z)
From Channel Bias to Feature Redundancy: Uncovering the "Less is More" Principle in Few-Shot Learning [138.06600932634896]
Deep neural networks often fail to adapt representations to novel tasks under distribution shifts.<n>This paper identifies a core obstacle behind this failure: channel bias.<n>We show that for few-shot tasks, classification accuracy is significantly improved by using as few as 1-5% of the most discriminative feature dimensions.
arXiv Detail & Related papers (2023-10-05T19:00:49Z)
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z)
The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z)
Generic Perceptual Loss for Modeling Structured Output Dependencies [78.59700528239141]
We show that, what matters is the network structure instead of the trained weights. We demonstrate that a randomly-weighted deep CNN can be used to model the structured dependencies of outputs.
arXiv Detail & Related papers (2021-03-18T23:56:07Z)
A simple geometric proof for the benefit of depth in ReLU networks [57.815699322370826]
We present a simple proof for the benefit of depth in multi-layer feedforward network with rectified activation ("depth separation") We present a concrete neural network with linear depth (in $m$) and small constant width ($leq 4$) that classifies the problem with zero error.
arXiv Detail & Related papers (2021-01-18T15:40:27Z)
Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks. Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting. We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough [19.19644194006565]
We show how much we can prune a neural network given a specified tolerance of accuracy drop. The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate. Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
arXiv Detail & Related papers (2020-10-29T22:06:31Z)
Grow-Push-Prune: aligning deep discriminants for effective structural network compression [5.532477732693]
This paper attempts to derive task-dependent compact models from a deep discriminant analysis perspective. We propose an iterative and proactive approach for classification tasks which alternates between a pushing step and a pruning step. Experiments on the MNIST, CIFAR10, and ImageNet datasets demonstrate our approach's efficacy.
arXiv Detail & Related papers (2020-09-29T01:29:23Z)
On the Predictability of Pruning Across Scales [29.94870276983399]
We show that the error of magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.
arXiv Detail & Related papers (2020-06-18T15:41:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.