Related papers: Manifold Regularization for Memory-Efficient Training of Deep Neural Networks

Manifold Regularization for Memory-Efficient Training of Deep Neural Networks

URL: http://arxiv.org/abs/2305.17119v1
Date: Fri, 26 May 2023 17:40:15 GMT
Title: Manifold Regularization for Memory-Efficient Training of Deep Neural Networks
Authors: Shadi Sartipi and Edgar A. Bernal
Abstract summary: We propose a framework for achieving improved memory efficiency in the process of learning traditional neural networks. Use of the framework results in improved absolute performance and empirical generalization error relative to traditional learning techniques.
Score: 18.554311679277212
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the prevailing trends in the machine- and deep-learning community is to gravitate towards the use of increasingly larger models in order to keep pushing the state-of-the-art performance envelope. This tendency makes access to the associated technologies more difficult for the average practitioner and runs contrary to the desire to democratize knowledge production in the field. In this paper, we propose a framework for achieving improved memory efficiency in the process of learning traditional neural networks by leveraging inductive-bias-driven network design principles and layer-wise manifold-oriented regularization objectives. Use of the framework results in improved absolute performance and empirical generalization error relative to traditional learning techniques. We provide empirical validation of the framework, including qualitative and quantitative evidence of its effectiveness on two standard image datasets, namely CIFAR-10 and CIFAR-100. The proposed framework can be seamlessly combined with existing network compression methods for further memory savings.

Related papers

In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory [54.92893355284945]
Deep learning-based wireless receivers offer the potential to dynamically adapt to varying channel environments.<n>Current adaptation strategies, including joint training, hypernetwork-based methods, and meta-learning, either demonstrate limited flexibility or necessitate explicit optimization through gradient descent.<n>This paper presents gradient-free adaptation techniques rooted in the emerging paradigm of in-context learning (ICL)
arXiv Detail & Related papers (2025-06-18T06:43:55Z)
Hierarchical Feature-level Reverse Propagation for Post-Training Neural Networks [24.442592456755698]
End-to-end autonomous driving has emerged as a dominant paradigm, yet its highly entangled black-box models pose challenges in terms of interpretability and safety assurance.<n>This paper proposes a hierarchical and decoupled post-training framework tailored for pretrained neural networks.
arXiv Detail & Related papers (2025-06-08T15:19:03Z)
Stochastic Engrams for Efficient Continual Learning with Binarized Neural Networks [4.014396794141682]
We propose a novel approach that integrates plasticityally-activated engrams as a gating mechanism for metaplastic binarized neural networks (mBNNs) Our findings demonstrate (A) an improved stability vs. a trade-off, (B) a reduced memory intensiveness, and (C) an enhanced performance in binarized architectures.
arXiv Detail & Related papers (2025-03-27T12:21:00Z)
Automated Flow Pattern Classification in Multi-phase Systems Using AI and Capacitance Sensing Techniques [0.9374652839580183]
This study introduces a novel platform that integrates a capacitance sensor and AI-driven classification methods, benchmarked against traditional techniques. Experimental results demonstrate that the proposed approach, utilizing a 1D SENet deep learning model, achieves over 85% accuracy on experiment-based datasets and 71% accuracy on pattern-based datasets. This work offers a transformative pathway for real-time flow monitoring and predictive modeling, addressing key challenges in industrial applications.
arXiv Detail & Related papers (2025-02-23T04:11:29Z)
What Really Matters for Learning-based LiDAR-Camera Calibration [50.2608502974106]
This paper revisits the development of learning-based LiDAR-Camera calibration. We identify the critical limitations of regression-based methods with the widely used data generation pipeline. We also investigate how the input data format and preprocessing operations impact network performance.
arXiv Detail & Related papers (2025-01-28T14:12:32Z)
Review of Digital Asset Development with Graph Neural Network Unlearning [0.0]
This paper investigates the critical role of Graph Neural Networks (GNNs) in the management of digital assets. We introduce innovative unlearning techniques specifically tailored to GNN architectures. We highlight their applicability in various use cases, including fraud detection, risk assessment, token relationship prediction, and decentralized governance.
arXiv Detail & Related papers (2024-09-27T05:31:04Z)
A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z)
Towards Improving Robustness Against Common Corruptions using Mixture of Class Specific Experts [10.27974860479791]
This paper introduces a novel paradigm known as the Mixture of Class-Specific Expert Architecture. The proposed architecture aims to mitigate vulnerabilities associated with common neural network structures.
arXiv Detail & Related papers (2023-11-16T20:09:47Z)
Distilling Knowledge from Resource Management Algorithms to Neural Networks: A Unified Training Assistance Approach [18.841969905928337]
knowledge distillation (KD) based algorithm distillation (AD) method is proposed in this paper to improve the performance and convergence speed of the NN-based method. This research paves the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization.
arXiv Detail & Related papers (2023-08-15T00:30:58Z)
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
Multiplicative update rules for accelerating deep learning training and increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z)
Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner. Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server. This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z)
A New Clustering-Based Technique for the Acceleration of Deep Convolutional Networks [2.7393821783237184]
Model Compression and Acceleration (MCA) techniques are used to transform large pre-trained networks into smaller models. We propose a clustering-based approach that is able to increase the number of employed centroids/representatives. This is achieved by imposing a special structure to the employed representatives, which is enabled by the particularities of the problem at hand.
arXiv Detail & Related papers (2021-07-19T18:22:07Z)
Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation [65.79387438988554]
Lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. We propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) We find that both the richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the mapping.
arXiv Detail & Related papers (2021-07-06T14:08:54Z)
CosSGD: Nonlinear Quantization for Communication-efficient Federated Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server. We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning. Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.