Related papers: GuideBP: Guiding Backpropagation Through Weaker Pathways of Parallel Logits

GuideBP: Guiding Backpropagation Through Weaker Pathways of Parallel Logits

URL: http://arxiv.org/abs/2104.11620v1
Date: Fri, 23 Apr 2021 14:14:00 GMT
Title: GuideBP: Guiding Backpropagation Through Weaker Pathways of Parallel Logits
Authors: Bodhisatwa Mandal, Swarnendu Ghosh, Teresa Gon\c{c}alves, Paulo Quaresma, Mita Nasipuri, Nibaran Das
Abstract summary: The proposed approach guides the gradients of backpropagation along weakest concept representations. A weakness scores defines the class specific performance of individual pathways which is then used to create a logit. The proposed approach has been shown to perform better than traditional column merging techniques.
Score: 6.764324841419295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional neural networks often generate multiple logits and use simple techniques like addition or averaging for loss computation. But this allows gradients to be distributed equally among all paths. The proposed approach guides the gradients of backpropagation along weakest concept representations. A weakness scores defines the class specific performance of individual pathways which is then used to create a logit that would guide gradients along the weakest pathways. The proposed approach has been shown to perform better than traditional column merging techniques and can be used in several application scenarios. Not only can the proposed model be used as an efficient technique for training multiple instances of a model parallely, but also CNNs with multiple output branches have been shown to perform better with the proposed upgrade. Various experiments establish the flexibility of the learning technique which is simple yet effective in various multi-objective scenarios both empirically and statistically.

Related papers

Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Condensed Gradient Boosting [0.0]
We propose the use of multi-output regressors as base models to handle the multi-class problem as a single task. An extensive comparison with other multi-ouptut based gradient boosting methods is carried out in terms of generalization and computational efficiency.
arXiv Detail & Related papers (2022-11-26T15:53:19Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Efficiently Disentangle Causal Representations [37.1087310583588]
We approximate the difference with models' generalization abilities so that it fits in the standard machine learning framework. In contrast to the state-of-the-art approach, which relies on the learner's adaptation speed to new distribution, the proposed approach only requires evaluating the model's generalization ability.
arXiv Detail & Related papers (2022-01-06T07:12:36Z)
Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework [43.76337849044254]
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives. We propose UniGrad, a simple but effective gradient form for self-supervised learning.
arXiv Detail & Related papers (2021-12-09T18:59:57Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z)
Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization [16.02377434191239]
We propose an accumulated decoupled learning (ADL) which incorporates the gradient accumulation technique to mitigate the stale gradient effect. We prove that the proposed method can converge to critical points, i.e., the gradients converge to 0, in spite of its asynchronous nature. The ADL is shown to outperform several state-of-the-arts in the classification tasks, and is the fastest among the compared methods.
arXiv Detail & Related papers (2020-12-03T11:52:55Z)
A Multilevel Approach to Training [0.0]
We propose a novel training method based on nonlinear multilevel techniques, commonly used for solving discretized large scale partial differential equations. Our multilevel training method constructs a multilevel hierarchy by reducing the number of samples. The training of the original model is then enhanced by internally training surrogate models constructed with fewer samples.
arXiv Detail & Related papers (2020-06-28T13:34:48Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.