Related papers: Rethinking "Batch" in BatchNorm

Rethinking "Batch" in BatchNorm

URL: http://arxiv.org/abs/2105.07576v1
Date: Mon, 17 May 2021 01:58:15 GMT
Title: Rethinking "Batch" in BatchNorm
Authors: Yuxin Wu, Justin Johnson
Abstract summary: BatchNorm is a critical building block in modern convolutional neural networks. This paper thoroughly reviews such problems in visual recognition tasks, and shows that a key to address them is to rethink different choices in the concept of "batch" in BatchNorm.
Score: 25.69755850518617
License: http://creativecommons.org/licenses/by/4.0/
Abstract: BatchNorm is a critical building block in modern convolutional neural networks. Its unique property of operating on "batches" instead of individual samples introduces significantly different behaviors from most other operations in deep learning. As a result, it leads to many hidden caveats that can negatively impact model's performance in subtle ways. This paper thoroughly reviews such problems in visual recognition tasks, and shows that a key to address them is to rethink different choices in the concept of "batch" in BatchNorm. By presenting these caveats and their mitigations, we hope this review can help researchers use BatchNorm more effectively.

Related papers

Impact of Batch Normalization on Convolutional Network Representations [0.5530212768657544]
Batch normalization (BatchNorm) is a popular layer normalization technique used when training deep neural networks. We investigate the effect of BatchNorm on the resulting hidden representations, that is, the vectors of activation values formed as samples are processed at each hidden layer.
arXiv Detail & Related papers (2025-01-24T12:22:18Z)
Deconstructing In-Context Learning: Understanding Prompts via Corruption [13.37109575313212]
We decompose the entire prompt into four components: task description, demonstration inputs, labels, and inline instructions. We study models ranging from 1.5B to 70B in size, using ten datasets covering classification and generation tasks. We find that repeating text within the prompt boosts model performance, and bigger models are more sensitive to the semantics of the prompt.
arXiv Detail & Related papers (2024-04-02T15:50:55Z)
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions [138.49522643425334]
Bongard-HOI is a new visual reasoning benchmark that focuses on compositional learning of human-object interactions from natural images. It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning. Bongard-HOI presents a substantial challenge to today's visual recognition models.
arXiv Detail & Related papers (2022-05-27T07:36:29Z)
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch. BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z)
Training BatchNorm Only in Neural Architecture Search and Beyond [17.21663067385715]
There is no effort to understand why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time. We show that train-BN-only supernet provides an advantage on convolutions over other operators, cause unfair competition between architectures. We propose a novel composite performance indicator to evaluate networks from three perspectives.
arXiv Detail & Related papers (2021-12-01T04:09:09Z)
Can contrastive learning avoid shortcut solutions? [88.249082564465]
implicit feature modification (IFM) is a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features. IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks.
arXiv Detail & Related papers (2021-06-21T16:22:43Z)
Investigating the Role of Negatives in Contrastive Representation Learning [59.30700308648194]
Noise contrastive learning is a popular technique for unsupervised representation learning. We focus on disambiguating the role of one of these parameters: the number of negative examples. We find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives.
arXiv Detail & Related papers (2021-06-18T06:44:16Z)
Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff. We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z)
PowerEvaluationBALD: Efficient Evaluation-Oriented Deep (Bayesian) Active Learning with Stochastic Acquisition Functions [2.0305676256390934]
We develop BatchEvaluationBALD, a new acquisition function for deep active learning. We also develop a variant for the non-Bayesian setting, which we call Evaluation Information Gain. To reduce computational requirements and allow these methods to scale to larger batch sizes, we introduce acquisition functions that use importance-sampling of tempered acquisition scores.
arXiv Detail & Related papers (2021-01-10T13:46:45Z)
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)
Towards an Adversarially Robust Normalization Approach [8.744644782067368]
Batch Normalization (BatchNorm) is effective for improving the performance and accelerating the training of deep neural networks. It has also shown to be a cause of adversarial vulnerability, i.e., networks without it are more robust to adversarial attacks. We propose Robust Normalization (RobustNorm); an adversarially robust version of BatchNorm.
arXiv Detail & Related papers (2020-06-19T08:12:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.