Exploiting Explainable Metrics for Augmented SGD
- URL: http://arxiv.org/abs/2203.16723v1
- Date: Thu, 31 Mar 2022 00:16:44 GMT
- Title: Exploiting Explainable Metrics for Augmented SGD
- Authors: Mahdi S. Hosseini and Mathieu Tuli and Konstantinos N. Plataniotis
- Abstract summary: There are several unanswered questions about how learning under optimization really works and why certain strategies are better than others.
We propose new explainability metrics that measure the redundant information in a network's layers.
We then exploit these metrics to augment the Gradient Descent (SGD) by adaptively adjusting the learning rate in each layer to improve generalization performance.
- Score: 43.00691899858408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining the generalization characteristics of deep learning is an emerging
topic in advanced machine learning. There are several unanswered questions
about how learning under stochastic optimization really works and why certain
strategies are better than others. In this paper, we address the following
question: \textit{can we probe intermediate layers of a deep neural network to
identify and quantify the learning quality of each layer?} With this question
in mind, we propose new explainability metrics that measure the redundant
information in a network's layers using a low-rank factorization framework and
quantify a complexity measure that is highly correlated with the generalization
performance of a given optimizer, network, and dataset. We subsequently exploit
these metrics to augment the Stochastic Gradient Descent (SGD) optimizer by
adaptively adjusting the learning rate in each layer to improve in
generalization performance. Our augmented SGD -- dubbed RMSGD -- introduces
minimal computational overhead compared to SOTA methods and outperforms them by
exhibiting strong generalization characteristics across application,
architecture, and dataset.
Related papers
- CaAdam: Improving Adam optimizer using connection aware methods [0.0]
We introduce a new method inspired by Adam that enhances convergence speed and achieves better loss function minima.
Traditional proxies, including Adam, apply uniform or globally adjusted learning rates across neural networks without considering their architectural specifics.
Our algorithm, CaAdam, explores this overlooked area by introducing connection-aware optimization through carefully designed of architectural information.
arXiv Detail & Related papers (2024-10-31T17:59:46Z) - Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - Informed deep hierarchical classification: a non-standard analysis inspired approach [0.0]
It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer.
The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields.
To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks.
arXiv Detail & Related papers (2024-09-25T14:12:50Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - LaplaceNet: A Hybrid Energy-Neural Model for Deep Semi-Supervised
Classification [0.0]
Recent developments in deep semi-supervised classification have reached unprecedented performance.
We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity.
Our model outperforms state-of-the-art methods for deep semi-supervised classification, over several benchmark datasets.
arXiv Detail & Related papers (2021-06-08T17:09:28Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Regularizing Deep Networks with Semantic Data Augmentation [44.53483945155832]
We propose a novel semantic data augmentation algorithm to complement traditional approaches.
The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features.
We show that the proposed implicit semantic data augmentation (ISDA) algorithm amounts to minimizing a novel robust CE loss.
arXiv Detail & Related papers (2020-07-21T00:32:44Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.