Manifold Regularization for Memory-Efficient Training of Deep Neural
Networks
- URL: http://arxiv.org/abs/2305.17119v1
- Date: Fri, 26 May 2023 17:40:15 GMT
- Title: Manifold Regularization for Memory-Efficient Training of Deep Neural
Networks
- Authors: Shadi Sartipi and Edgar A. Bernal
- Abstract summary: We propose a framework for achieving improved memory efficiency in the process of learning traditional neural networks.
Use of the framework results in improved absolute performance and empirical generalization error relative to traditional learning techniques.
- Score: 18.554311679277212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the prevailing trends in the machine- and deep-learning community is
to gravitate towards the use of increasingly larger models in order to keep
pushing the state-of-the-art performance envelope. This tendency makes access
to the associated technologies more difficult for the average practitioner and
runs contrary to the desire to democratize knowledge production in the field.
In this paper, we propose a framework for achieving improved memory efficiency
in the process of learning traditional neural networks by leveraging
inductive-bias-driven network design principles and layer-wise
manifold-oriented regularization objectives. Use of the framework results in
improved absolute performance and empirical generalization error relative to
traditional learning techniques. We provide empirical validation of the
framework, including qualitative and quantitative evidence of its effectiveness
on two standard image datasets, namely CIFAR-10 and CIFAR-100. The proposed
framework can be seamlessly combined with existing network compression methods
for further memory savings.
Related papers
- Review of Digital Asset Development with Graph Neural Network Unlearning [0.0]
This paper investigates the critical role of Graph Neural Networks (GNNs) in the management of digital assets.
We introduce innovative unlearning techniques specifically tailored to GNN architectures.
We highlight their applicability in various use cases, including fraud detection, risk assessment, token relationship prediction, and decentralized governance.
arXiv Detail & Related papers (2024-09-27T05:31:04Z) - A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge.
Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques.
This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z) - Towards Improving Robustness Against Common Corruptions using Mixture of
Class Specific Experts [10.27974860479791]
This paper introduces a novel paradigm known as the Mixture of Class-Specific Expert Architecture.
The proposed architecture aims to mitigate vulnerabilities associated with common neural network structures.
arXiv Detail & Related papers (2023-11-16T20:09:47Z) - Distilling Knowledge from Resource Management Algorithms to Neural
Networks: A Unified Training Assistance Approach [18.841969905928337]
knowledge distillation (KD) based algorithm distillation (AD) method is proposed in this paper to improve the performance and convergence speed of the NN-based method.
This research paves the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization.
arXiv Detail & Related papers (2023-08-15T00:30:58Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner.
Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server.
This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z) - A New Clustering-Based Technique for the Acceleration of Deep
Convolutional Networks [2.7393821783237184]
Model Compression and Acceleration (MCA) techniques are used to transform large pre-trained networks into smaller models.
We propose a clustering-based approach that is able to increase the number of employed centroids/representatives.
This is achieved by imposing a special structure to the employed representatives, which is enabled by the particularities of the problem at hand.
arXiv Detail & Related papers (2021-07-19T18:22:07Z) - Embracing the Dark Knowledge: Domain Generalization Using Regularized
Knowledge Distillation [65.79387438988554]
Lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application.
We propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG)
We find that both the richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the mapping.
arXiv Detail & Related papers (2021-07-06T14:08:54Z) - CosSGD: Nonlinear Quantization for Communication-efficient Federated
Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server.
We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning.
Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.