Related papers: Random Registers for Cross-Domain Few-Shot Learning

Random Registers for Cross-Domain Few-Shot Learning

URL: http://arxiv.org/abs/2506.02843v1
Date: Tue, 03 Jun 2025 13:13:58 GMT
Title: Random Registers for Cross-Domain Few-Shot Learning
Authors: Shuai Yi, Yixiong Zou, Yuhua Li, Ruixuan Li,
Abstract summary: Cross-domain few-shot learning aims to transfer knowledge from a data-sufficient source domain to data-scarce target domains.<n>We find that during the source-domain training, prompt tuning, as a common way to train ViT, could be harmful for the generalization of ViT in target domains.<n>We propose a simple but effective approach for CDFSL by adding random registers on the semantic regions of image tokens.
Score: 19.199947811410123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-domain few-shot learning (CDFSL) aims to transfer knowledge from a data-sufficient source domain to data-scarce target domains. Although Vision Transformer (ViT) has shown superior capability in many vision tasks, its transferability against huge domain gaps in CDFSL is still under-explored. In this paper, we find an intriguing phenomenon: during the source-domain training, prompt tuning, as a common way to train ViT, could be harmful for the generalization of ViT in target domains, but setting them to random noises (i.e., random registers) could consistently improve target-domain performance. We then delve into this phenomenon for an interpretation. We find that learnable prompts capture domain information during the training on the source dataset, which views irrelevant visual patterns as vital cues for recognition. This can be viewed as a kind of overfitting and increases the sharpness of the loss landscapes. In contrast, random registers are essentially a novel way of perturbing attention for the sharpness-aware minimization, which helps the model find a flattened minimum in loss landscapes, increasing the transferability. Based on this phenomenon and interpretation, we further propose a simple but effective approach for CDFSL to enhance the perturbation on attention maps by adding random registers on the semantic regions of image tokens, improving the effectiveness and efficiency of random registers. Extensive experiments on four benchmarks validate our rationale and state-of-the-art performance. Codes and models are available at https://github.com/shuaiyi308/REAP.

Related papers

Revisiting Continuity of Image Tokens for Cross-domain Few-shot Learning [19.199947811410123]
Vision Transformer (ViT) has achieved remarkable success due to its large-scale pretraining on general domains.<n>But it still faces challenges when applying it to downstream distant domains that have only scarce training data.<n>Inspired by Self-Attention's insensitivity to token orders, we find an interesting phenomenon neglected in current works: disrupting the continuity of image tokens.
arXiv Detail & Related papers (2025-06-03T17:40:36Z)
SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote Sensing [13.549403813487022]
Unsupervised domain adaptation (UDA) enables models to learn from unlabeled target domain data while leveraging labeled source domain data.<n>We propose integrating contrastive learning into UDA, enhancing the model's ability to capture semantic information in the target domain.<n>Our method, SimSeg, outperforms existing approaches, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-17T11:59:39Z)
Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations [32.7667209371645]
Existing models experience notable performance degradation on unseen domains due to domain shifts. We propose a novel framework where representations of paired data from different domains are decoupled into semantic features and domain noise. The resulting augmented representation comprises original retinal semantics and domain noise from other domains, aiming to generate enhanced representations aligned with real-world clinical needs.
arXiv Detail & Related papers (2024-06-10T15:43:56Z)
Prompt-based Visual Alignment for Zero-shot Policy Transfer [35.784936617675896]
Overfitting in reinforcement learning has become one of the main obstacles to applications in reinforcement learning. We propose prompt-based visual alignment (PVA) to mitigate the detrimental domain bias in the image for zero-shot policy transfer. We verify PVA on a vision-based autonomous driving task with CARLA simulator.
arXiv Detail & Related papers (2024-06-05T13:26:30Z)
Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL) In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain. We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z)
On the Transferability of Visually Grounded PCFGs [35.64371385720051]
Visually-grounded Compound PCFGcitepzhao-titov-2020-visually. We consider a zero-shot transfer learning setting where a model is trained on the source domain and is directly applied to target domains, without any further training. Our experimental results suggest that: the benefits from using visual groundings transfer to text in a domain similar to the training domain but fail to transfer to remote domains.
arXiv Detail & Related papers (2023-10-21T20:19:51Z)
CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples. Existing methods in video action recognition rely on large labeled datasets from the same domain. We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z)
Efficient entity-based reinforcement learning [3.867363075280544]
We propose to combine recent advances in set representations with slot attention and graph neural networks to process structured data. We show that it can improve training time and robustness significantly, and demonstrate their potential to handle structured as well as purely visual domains.
arXiv Detail & Related papers (2022-06-06T19:02:39Z)
Self-Promoted Supervision for Few-Shot Transformer [178.52948452353834]
Self-promoted sUpervisioN (SUN) is a few-shot learning framework for vision transformers (ViTs) SUN pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token. Experiments show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts.
arXiv Detail & Related papers (2022-03-14T12:53:27Z)
Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials. We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field. Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z)
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation [54.61786380919243]
Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge remains unexplored in the literature.
arXiv Detail & Related papers (2021-08-12T22:37:43Z)
Background Adaptive Faster R-CNN for Semi-Supervised Convolutional Object Detection of Threats in X-Ray Images [64.39996451133268]
We present a semi-supervised approach for threat recognition which we call Background Adaptive Faster R-CNN. This approach is a training method for two-stage object detectors which uses Domain Adaptation methods from the field of deep learning. Two domain discriminators, one for discriminating object proposals and one for image features, are adversarially trained to prevent encoding domain-specific information. This can reduce threat detection false alarm rates by matching the statistics of extracted features from hand-collected backgrounds to real world data.
arXiv Detail & Related papers (2020-10-02T21:05:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.