Semantic-visual Guided Transformer for Few-shot Class-incremental
Learning
- URL: http://arxiv.org/abs/2303.15494v1
- Date: Mon, 27 Mar 2023 15:06:49 GMT
- Title: Semantic-visual Guided Transformer for Few-shot Class-incremental
Learning
- Authors: Wenhao Qiu, Sichao Fu, Jingyi Zhang, Chengxiang Lei, Qinmu Peng
- Abstract summary: We develop a semantic-visual guided Transformer (SV-T) to enhance the feature extracting capacity of the pre-trained feature backbone on incremental classes.
Our SV-T can take full advantage of more supervision information from base classes and further enhance the training robustness of the feature backbone.
- Score: 6.300141694311465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot class-incremental learning (FSCIL) has recently attracted extensive
attention in various areas. Existing FSCIL methods highly depend on the
robustness of the feature backbone pre-trained on base classes. In recent
years, different Transformer variants have obtained significant processes in
the feature representation learning of massive fields. Nevertheless, the
progress of the Transformer in FSCIL scenarios has not achieved the potential
promised in other fields so far. In this paper, we develop a semantic-visual
guided Transformer (SV-T) to enhance the feature extracting capacity of the
pre-trained feature backbone on incremental classes. Specifically, we first
utilize the visual (image) labels provided by the base classes to supervise the
optimization of the Transformer. And then, a text encoder is introduced to
automatically generate the corresponding semantic (text) labels for each image
from the base classes. Finally, the constructed semantic labels are further
applied to the Transformer for guiding its hyperparameters updating. Our SV-T
can take full advantage of more supervision information from base classes and
further enhance the training robustness of the feature backbone. More
importantly, our SV-T is an independent method, which can directly apply to the
existing FSCIL architectures for acquiring embeddings of various incremental
classes. Extensive experiments on three benchmarks, two FSCIL architectures,
and two Transformer variants show that our proposed SV-T obtains a significant
improvement in comparison to the existing state-of-the-art FSCIL methods.
Related papers
- Transformer as Linear Expansion of Learngene [38.16612771203953]
Linear Expansion of learnGene (TLEG) is a novel approach for flexibly producing and initializing Transformers of diverse depths.
Experiments on ImageNet-1K demonstrate that TLEG achieves comparable or better performance in contrast to many individual models trained from scratch.
arXiv Detail & Related papers (2023-12-09T17:01:18Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Pre-training Transformers for Knowledge Graph Completion [81.4078733132239]
We introduce a novel inductive KG representation model (iHT) for learning transferable representation for knowledge graphs.
iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers.
Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models.
arXiv Detail & Related papers (2023-03-28T02:10:37Z) - Foundation Transformers [105.06915886136524]
We call for the development of Foundation Transformer for true general-purpose modeling.
In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal.
arXiv Detail & Related papers (2022-10-12T17:16:27Z) - TransReID: Transformer-based Object Re-Identification [20.02035310635418]
Vision Transformer (ViT) is a pure transformer-based model for the object re-identification (ReID) task.
With several adaptations, a strong baseline ViT-BoT is constructed with ViT as backbone.
We propose a pure-transformer framework dubbed as TransReID, which is the first work to use a pure Transformer for ReID research.
arXiv Detail & Related papers (2021-02-08T17:33:59Z) - Transformer-based Conditional Variational Autoencoder for Controllable
Story Generation [39.577220559911055]
We investigate large-scale latent variable models (LVMs) for neural story generation with objectives in two threads: generation effectiveness and controllability.
We advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers.
Specifically, we integrate latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE)
arXiv Detail & Related papers (2021-01-04T08:31:11Z) - Multi-branch Attentive Transformer [152.07840447196384]
We propose a simple yet effective variant of Transformer called multi-branch attentive Transformer.
The attention layer is the average of multiple branches and each branch is an independent multi-head attention layer.
Experiments on machine translation, code generation and natural language understanding demonstrate that such a simple variant of Transformer brings significant improvements.
arXiv Detail & Related papers (2020-06-18T04:24:28Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.