Related papers: Stochastic Vision Transformers with Wasserstein Distance-Aware Attention

Stochastic Vision Transformers with Wasserstein Distance-Aware Attention

URL: http://arxiv.org/abs/2311.18645v1
Date: Thu, 30 Nov 2023 15:53:37 GMT
Title: Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Authors: Franciskus Xaverius Erick, Mina Rezaei, Johanna Paula M\"uller, Bernhard Kainz
Abstract summary: Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. We introduce a new vision transformer that integrates uncertainty and distance awareness into self-supervised learning pipelines. Our proposed method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on a variety of datasets.
Score: 8.407731308079025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. Despite the substantial advancements made in recent years, self-supervised models have posed a challenge to practitioners, as they do not readily provide insight into the model's confidence and uncertainty. Tackling this issue is no simple feat, primarily due to the complexity involved in implementing techniques that can make use of the latent representations learned during pre-training without relying on explicit labels. Motivated by this, we introduce a new stochastic vision transformer that integrates uncertainty and distance awareness into self-supervised learning (SSL) pipelines. Instead of the conventional deterministic vector embedding, our novel stochastic vision transformer encodes image patches into elliptical Gaussian distributional embeddings. Notably, the attention matrices of these stochastic representational embeddings are computed using Wasserstein distance-based attention, effectively capitalizing on the distributional nature of these embeddings. Additionally, we propose a regularization term based on Wasserstein distance for both pre-training and fine-tuning processes, thereby incorporating distance awareness into latent representations. We perform extensive experiments across different tasks such as in-distribution generalization, out-of-distribution detection, dataset corruption, semi-supervised settings, and transfer learning to other datasets and tasks. Our proposed method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on a variety of datasets.

Related papers

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability. Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead. We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z)
ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z)
Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs. Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z)
In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification. We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z)
Unsupervised Self-Driving Attention Prediction via Uncertainty Mining and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration. Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z)
Evaluating the Label Efficiency of Contrastive Self-Supervised Learning for Multi-Resolution Satellite Imagery [0.0]
Self-supervised learning has been applied in the remote sensing domain to exploit readily-available unlabeled data. In this paper, we study self-supervised visual representation learning through the lens of label efficiency.
arXiv Detail & Related papers (2022-10-13T06:54:13Z)
Uncertainty in Contrastive Learning: On the Predictability of Downstream Performance [7.411571833582691]
We study whether the uncertainty of such a representation can be quantified for a single datapoint in a meaningful way. We show that this goal can be achieved by directly estimating the distribution of the training data in the embedding space.
arXiv Detail & Related papers (2022-07-19T15:44:59Z)
Squeezing Backbone Feature Distributions to the Max for Efficient Few-Shot Learning [3.1153758106426603]
Few-shot classification is a challenging problem due to the uncertainty caused by using few labelled samples. We propose a novel transfer-based method which aims at processing the feature vectors so that they become closer to Gaussian-like distributions. In the case of transductive few-shot learning where unlabelled test samples are available during training, we also introduce an optimal-transport inspired algorithm to boost even further the achieved performance.
arXiv Detail & Related papers (2021-10-18T16:29:17Z)
InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via Intermediary Latents [60.785317191131284]
We introduce a simple and effective method for learning VAEs with controllable biases by using an intermediary set of latent variables. In particular, it allows us to impose desired properties like sparsity or clustering on learned representations. We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.
arXiv Detail & Related papers (2021-06-25T16:34:05Z)
Ask-n-Learn: Active Learning via Reliable Gradient Representations for Image Classification [29.43017692274488]
Deep predictive models rely on human supervision in the form of labeled training data. We propose Ask-n-Learn, an active learning approach based on gradient embeddings obtained using the pesudo-labels estimated in each of the algorithm.
arXiv Detail & Related papers (2020-09-30T05:19:56Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.