BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning
- URL: http://arxiv.org/abs/2203.01522v1
- Date: Thu, 3 Mar 2022 05:31:33 GMT
- Title: BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning
- Authors: Zhi Hou, Baosheng Yu, Dacheng Tao
- Abstract summary: We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch.
BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training.
We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
- Score: 93.38239238988719
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the success of deep neural networks, there are still many challenges
in deep representation learning due to the data scarcity issues such as data
imbalance, unseen distribution, and domain shift. To address the
above-mentioned issues, a variety of methods have been devised to explore the
sample relationships in a vanilla way (i.e., from the perspectives of either
the input or the loss function), failing to explore the internal structure of
deep neural networks for learning with sample relationships. Inspired by this,
we propose to enable deep neural networks themselves with the ability to learn
the sample relationships from each mini-batch. Specifically, we introduce a
batch transformer module or BatchFormer, which is then applied into the batch
dimension of each mini-batch to implicitly explore sample relationships during
training. By doing this, the proposed method enables the collaboration of
different samples, e.g., the head-class samples can also contribute to the
learning of the tail classes for long-tailed recognition. Furthermore, to
mitigate the gap between training and testing, we share the classifier between
with or without the BatchFormer during training, which can thus be removed
during testing. We perform extensive experiments on over ten datasets and the
proposed method achieves significant improvements on different data scarcity
applications without any bells and whistles, including the tasks of long-tailed
recognition, compositional zero-shot learning, domain generalization, and
contrastive learning. Code will be made publicly available at
\url{https://github.com/zhihou7/BatchFormer}
Related papers
- Class incremental learning with probability dampening and cascaded gated classifier [4.285597067389559]
We propose a novel incremental regularisation approach called Margin Dampening and Cascaded Scaling.
The first combines a soft constraint and a knowledge distillation approach to preserve past knowledge while allowing forgetting new patterns.
We empirically show that our approach performs well on multiple benchmarks well-established baselines.
arXiv Detail & Related papers (2024-02-02T09:33:07Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Negotiated Representations for Machine Mearning Application [0.0]
Overfitting is a phenomenon that occurs when a machine learning model is trained for too long and focused too much on the exact fitness of the training samples to the provided training labels.
We present an approach that increases the classification accuracy of machine learning models by allowing the model to negotiate output representations of the samples with previously determined class labels.
arXiv Detail & Related papers (2023-11-19T19:53:49Z) - Provable Advantage of Curriculum Learning on Parity Targets with Mixed
Inputs [21.528321119061694]
We show a separation result in the number of training steps with standard (bounded) learning rates on a common sample distribution.
We also provide experimental results supporting the qualitative separation beyond the specific regime of the theoretical results.
arXiv Detail & Related papers (2023-06-29T13:14:42Z) - Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z) - Learning to Imagine: Diversify Memory for Incremental Learning using
Unlabeled Data [69.30452751012568]
We develop a learnable feature generator to diversify exemplars by adaptively generating diverse counterparts of exemplars.
We introduce semantic contrastive learning to enforce the generated samples to be semantic consistent with exemplars.
Our method does not bring any extra inference cost and outperforms state-of-the-art methods on two benchmarks.
arXiv Detail & Related papers (2022-04-19T15:15:18Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - MetaKernel: Learning Variational Random Features with Limited Labels [120.90737681252594]
Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks.
We propose meta-learning kernels with random Fourier features for few-shot learning, we call Meta Kernel.
arXiv Detail & Related papers (2021-05-08T21:24:09Z) - Learning Intra-Batch Connections for Deep Metric Learning [3.5665681694253903]
metric learning aims to learn a function that maps samples to a lower-dimensional space where similar samples lie closer than dissimilar ones.
Most approaches rely on losses that only take the relations between pairs or triplets of samples into account.
We propose an approach based on message passing networks that takes into account all the relations in a mini-batch.
arXiv Detail & Related papers (2021-02-15T18:50:00Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.