Related papers: LNPT: Label-free Network Pruning and Training

LNPT: Label-free Network Pruning and Training

URL: http://arxiv.org/abs/2403.12690v2
Date: Wed, 20 Mar 2024 11:06:34 GMT
Title: LNPT: Label-free Network Pruning and Training
Authors: Jinying Xiao, Ping Li, Zhe Tang, Jie Nie,
Abstract summary: Pruning before training enables the deployment of neural networks on smart devices. We propose a novel learning framework, LNPT, which enables mature networks on the cloud to provide online guidance for network pruning and learning on smart devices with unlabeled data.
Score: 18.535687216213624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pruning before training enables the deployment of neural networks on smart devices. By retaining weights conducive to generalization, pruned networks can be accommodated on resource-constrained smart devices. It is commonly held that the distance on weight norms between the initialized and the fully-trained networks correlates with generalization performance. However, as we have uncovered, inconsistency between this metric and generalization during training processes, which poses an obstacle to determine the pruned structures on smart devices in advance. In this paper, we introduce the concept of the learning gap, emphasizing its accurate correlation with generalization. Experiments show that the learning gap, in the form of feature maps from the penultimate layer of networks, aligns with variations of generalization performance. We propose a novel learning framework, LNPT, which enables mature networks on the cloud to provide online guidance for network pruning and learning on smart devices with unlabeled data. Our results demonstrate the superiority of this approach over supervised training.

Related papers

A Spectral Condition for Feature Learning [20.440553685976194]
Key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths. We show that feature learning is achieved by scaling the spectral norm of weight and their updates.
arXiv Detail & Related papers (2023-10-26T23:17:39Z)
Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions [3.04585143845864]
We show that as networks get deeper and deeper, they are more susceptible to becoming degenerate. We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture.
arXiv Detail & Related papers (2023-06-02T13:02:52Z)
Network Anomaly Detection Using Federated Learning [0.483420384410068]
We introduce a robust and scalable framework that enables efficient network anomaly detection. We leverage federated learning, in which multiple participants train a global model jointly. The proposed method performs better than baseline machine learning techniques on the UNSW-NB15 data set.
arXiv Detail & Related papers (2023-03-13T20:16:30Z)
QuickNets: Saving Training and Preventing Overconfidence in Early-Exit Neural Architectures [2.28438857884398]
We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks. We demonstrate that QuickNets can dynamically distribute learning and have a reduced training cost and inference cost compared to standard Backpropagation.
arXiv Detail & Related papers (2022-12-25T07:06:32Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Unsupervised Deep Learning by Injecting Low-Rank and Sparse Priors [5.5586788751870175]
We focus on employing sparsity-inducing priors in deep learning to encourage the network to concisely capture the nature of high-dimensional data. We demonstrate unsupervised learning of U-Net for background subtraction using low-rank and sparse priors.
arXiv Detail & Related papers (2021-06-21T08:41:02Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network. In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.