Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning
- URL: http://arxiv.org/abs/2603.05235v1
- Date: Thu, 05 Mar 2026 14:51:52 GMT
- Title: Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning
- Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li,
- Abstract summary: Source-Free Cross-Domain Few-Shot Learning focuses on fine-tuning with limited training data from target domains.<n>We find that textbfremoving certain middle layers of the text encoder can effectively improve performance.<n>We propose a method to teachs the model to textbfre-utilize information in these lost layers at both the layer and encoder levels.
- Score: 30.807806199034587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) focuses on fine-tuning with limited training data from target domains (e.g., medical or satellite images), where CLIP has recently shown promising results due to its generalizability to downstream tasks. Current works indicate CLIP's text encoder is more suitable for cross-domain tasks, however, we find that \textbf{removing certain middle layers of the text encoder can effectively improve performance in SF-CDFSL}, which we call the Lost Layers. In this paper, we delve into this phenomenon for a deeper understanding. We discover that instead of being harmful for the SF-CDFSL task, the information in these layers is actually beneficial, but visual gaps prevent this useful information from being fully utilized, making these layers seem redundant. Based on this understanding, unlike current works that simply remove these layers, we propose a method to teachs the model to \textbf{re-utilize} information in these lost layers at both the layer and encoder levels, guiding the re-learning of the visual branch under domain shifts. Our approach effectively addresses the issue of underutilized information in the text encoder. Extensive experiments across various settings, backbones (CLIP, SigLip, PE-Core), and tasks (4 CDFSL datasets and 10 Meta-dataset datasets) demonstrate the effectiveness of our method. Code is available at https://github.com/zhenyuZ-HUST/CVPR26-VtT.
Related papers
- Privacy-Preserving CNN Training with Transfer Learning: Two Hidden Layers [0.0]
We present the demonstration of training a four-layer neural network entirely using fully homomorphic encryption (FHE)<n>A key contribution of our work is identifying that replacing textitSoftmax with textitSigmoid, in conjunction with the Binary Cross-Entropy (BCE) loss function, provides an effective and scalable solution for homomorphic classification.
arXiv Detail & Related papers (2025-04-17T03:58:23Z) - Text-Enhanced Data-free Approach for Federated Class-Incremental Learning [36.70524853012054]
Data-Free Knowledge Transfer plays a crucial role in addressing forgetting and data privacy problems.
Prior approaches lack the crucial synergy between DFKT and the model training phases.
We introduce LANDER to address this issue by utilizing label text embeddings produced by pretrained language models.
arXiv Detail & Related papers (2024-03-21T03:24:01Z) - FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly
Supervised Object Localization [10.08410402383604]
This work addresses the task of class-incremental weakly supervised object localization (CI-WSOL)
The goal is to incrementally learn object localization for novel classes using only image-level annotations while retaining the ability to localize previously learned classes.
We first present a strong baseline method for CI-WSOL by adapting the strategies of class-incremental classifiers to catastrophic forgetting.
We then propose the feature drift compensation network to compensate for the effects of feature drifts on class scores and localization maps.
arXiv Detail & Related papers (2023-09-17T01:10:45Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Layer-wise Representation Fusion for Compositional Generalization [26.771056871444692]
A key reason for failure on compositional generalization is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled.
We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers.
Inspired by this, we propose LRF, a novel textbfLayer-wise textbfRepresentation textbfFusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process.
arXiv Detail & Related papers (2023-07-20T12:01:40Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - Exploring Incompatible Knowledge Transfer in Few-shot Image Generation [107.81232567861117]
Few-shot image generation learns to generate diverse and high-fidelity images from a target domain using a few reference samples.
Existing F SIG methods select, preserve and transfer prior knowledge from a source generator to learn the target generator.
We propose knowledge truncation, which is a complementary operation to knowledge preservation and is implemented by a lightweight pruning-based method.
arXiv Detail & Related papers (2023-04-15T14:57:15Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey [33.00835033658241]
Few-shot learning enables models to perform the target tasks with very few labeled examples.<n>To overcome this limitation, Cross-domain few-shot learning has gained attention.<n>This paper presents the first comprehensive review of Cross-domain Few-shot Learning.
arXiv Detail & Related papers (2023-03-15T12:18:16Z) - Guillotine Regularization: Why removing layers is needed to improve
generalization in Self-Supervised Learning [15.009986848506486]
Guillotine Regularization (GR) is a generically applicable method that has been used to improve generalization performance in transfer learning scenarios.
We identify the underlying reasons behind its success and show that the optimal layer to use might change significantly depending on the training setup, the data or the downstream task.
arXiv Detail & Related papers (2022-06-27T15:37:54Z) - Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query.
We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Learning Visual Representations for Transfer Learning by Suppressing
Texture [38.901410057407766]
In self-supervised learning, texture as a low-level cue may provide shortcuts that prevent the network from learning higher level representations.
We propose to use classic methods based on anisotropic diffusion to augment training using images with suppressed texture.
We empirically show that our method achieves state-of-the-art results on object detection and image classification.
arXiv Detail & Related papers (2020-11-03T18:27:03Z) - Computationally Efficient NER Taggers with Combined Embeddings and
Constrained Decoding [10.643105866460978]
Current State-of-the-Art models in Named Entity Recognition (NER) are neural models with a Conditional Random Field (CRF) as the final network layer, and pre-trained "contextual embeddings"
In this work, we explore two simple techniques that substantially improve NER performance over a strong baseline with negligible cost.
While training a tagger on CoNLL 2003 we find a $786$% speed-up over a contextual embeddings-based tagger without sacrificing strong performance.
arXiv Detail & Related papers (2020-01-05T04:50:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.