PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection
- URL: http://arxiv.org/abs/2408.05674v1
- Date: Sun, 11 Aug 2024 02:21:43 GMT
- Title: PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection
- Authors: Yingjie Gao, Yanan Zhang, Ziyue Huang, Nanqing Liu, Di Huang,
- Abstract summary: Few-Shot Object Detection (FSOD) has gained widespread attention and made significant progress.
We propose a new framework for FSOD, namely Prototype-based Soft-labels and Test-Time Learning (PS-TTL)
- Score: 21.443060372419286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, Few-Shot Object Detection (FSOD) has gained widespread attention and made significant progress due to its ability to build models with a good generalization power using extremely limited annotated data. The fine-tuning based paradigm is currently dominating this field, where detectors are initially pre-trained on base classes with sufficient samples and then fine-tuned on novel ones with few samples, but the scarcity of labeled samples of novel classes greatly interferes precisely fitting their data distribution, thus hampering the performance. To address this issue, we propose a new framework for FSOD, namely Prototype-based Soft-labels and Test-Time Learning (PS-TTL). Specifically, we design a Test-Time Learning (TTL) module that employs a mean-teacher network for self-training to discover novel instances from test data, allowing detectors to learn better representations and classifiers for novel classes. Furthermore, we notice that even though relatively low-confidence pseudo-labels exhibit classification confusion, they still tend to recall foreground. We thus develop a Prototype-based Soft-labels (PS) strategy through assessing similarities between low-confidence pseudo-labels and category prototypes as soft-labels to unleash their potential, which substantially mitigates the constraints posed by few-shot samples. Extensive experiments on both the VOC and COCO benchmarks show that PS-TTL achieves the state-of-the-art, highlighting its effectiveness. The code and model are available at https://github.com/gaoyingjay/PS-TTL.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models [19.683461002518147]
Test-Time Prototype Shifting (TPS) is a pioneering approach designed to adapt vision-language models to test datasets using unlabeled test inputs.
TPS not only facilitates optimization-free prototype reuse for subsequent predictions but also enables seamless integration with current advancements in prompt engineering.
A notable aspect of our framework is its significantly reduced memory and computational demands when compared to conventional text-prompt tuning methods.
arXiv Detail & Related papers (2024-03-19T17:54:34Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - Rethinking Precision of Pseudo Label: Test-Time Adaptation via
Complementary Learning [10.396596055773012]
We propose a novel complementary learning approach to enhance test-time adaptation.
In test-time adaptation tasks, information from the source domain is typically unavailable.
We highlight that the risk function of complementary labels agrees with their Vanilla loss formula.
arXiv Detail & Related papers (2023-01-15T03:36:33Z) - Bridging Non Co-occurrence with Unlabeled In-the-wild Data for
Incremental Object Detection [56.22467011292147]
Several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection.
Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes.
We propose the use of unlabeled in-the-wild data to bridge the non-occurrence caused by the missing base classes during the training of additional novel classes.
arXiv Detail & Related papers (2021-10-28T10:57:25Z) - Training image classifiers using Semi-Weak Label Data [26.04162590798731]
In Multiple Instance learning (MIL), weak labels are provided at the bag level with only presence/absence information known.
This paper introduces a novel semi-weak label learning paradigm as a middle ground to mitigate the problem.
We propose a two-stage framework to address the problem of learning from semi-weak labels.
arXiv Detail & Related papers (2021-03-19T03:06:07Z) - Contrastive Prototype Learning with Augmented Embeddings for Few-Shot
Learning [58.2091760793799]
We propose a novel contrastive prototype learning with augmented embeddings (CPLAE) model.
With a class prototype as an anchor, CPL aims to pull the query samples of the same class closer and those of different classes further away.
Extensive experiments on several benchmarks demonstrate that our proposed CPLAE achieves new state-of-the-art.
arXiv Detail & Related papers (2021-01-23T13:22:44Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.