Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained
Ship Classification
- URL: http://arxiv.org/abs/2403.08271v1
- Date: Wed, 13 Mar 2024 05:48:58 GMT
- Title: Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained
Ship Classification
- Authors: Long Lan, Fengxiang Wang, Shuyan Li, Xiangtao Zheng, Zengmao Wang and
Xinwang Liu
- Abstract summary: Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data.
Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning.
This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories.
- Score: 62.425462136772666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained ship classification in remote sensing (RS-FGSC) poses a
significant challenge due to the high similarity between classes and the
limited availability of labeled data, limiting the effectiveness of traditional
supervised classification methods. Recent advancements in large pre-trained
Vision-Language Models (VLMs) have demonstrated impressive capabilities in
few-shot or zero-shot learning, particularly in understanding image content.
This study delves into harnessing the potential of VLMs to enhance
classification accuracy for unseen ship categories, which holds considerable
significance in scenarios with restricted data due to cost or privacy
constraints. Directly fine-tuning VLMs for RS-FGSC often encounters the
challenge of overfitting the seen classes, resulting in suboptimal
generalization to unseen classes, which highlights the difficulty in
differentiating complex backgrounds and capturing distinct ship features. To
address these issues, we introduce a novel prompt tuning technique that employs
a hierarchical, multi-granularity prompt design. Our approach integrates remote
sensing ship priors through bias terms, learned from a small trainable network.
This strategy enhances the model's generalization capabilities while improving
its ability to discern intricate backgrounds and learn discriminative ship
features. Furthermore, we contribute to the field by introducing a
comprehensive dataset, FGSCM-52, significantly expanding existing datasets with
more extensive data and detailed annotations for less common ship classes.
Extensive experimental evaluations demonstrate the superiority of our proposed
method over current state-of-the-art techniques. The source code will be made
publicly available.
Related papers
- OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning [57.43911113915546]
Few-Shot Class-Incremental Learning (FSCIL) introduces a paradigm in which the problem space expands with limited data.
FSCIL methods inherently face the challenge of catastrophic forgetting as data arrives incrementally.
We propose the OrCo framework built on two core principles: features' orthogonality in the representation space, and contrastive learning.
arXiv Detail & Related papers (2024-03-27T13:30:48Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - Mitigating Forgetting in Online Continual Learning via Contrasting
Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one.
Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z) - Consecutive Pretraining: A Knowledge Transfer Learning Strategy with
Relevant Unlabeled Data for Remote Sensing Domain [25.84756140221655]
ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP)
The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training.
The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning.
arXiv Detail & Related papers (2022-07-08T12:32:09Z) - 2nd Place Solution for ICCV 2021 VIPriors Image Classification
Challenge: An Attract-and-Repulse Learning Approach [41.346232387426944]
Convolutional neural networks (CNNs) have achieved significant success in image classification by utilizing large-scale datasets.
We propose Attract-and-Repulse, which consists of Contrastive Regularization (CR) to enrich the feature representations, Symmetric Cross Entropy (SCE) to balance the fitting for different classes.
Specifically, SCE and CR learn discriminative representations while alleviating over-fitting by the adaptive trade-off between the information of classes (attract) and instances (repulse)
arXiv Detail & Related papers (2022-06-13T13:54:33Z) - Task-Oriented Image Transmission for Scene Classification in Unmanned
Aerial Systems [46.64800170644672]
We propose a new aerial image transmission paradigm for the scene classification task.
A lightweight model is developed on the front-end UAV for semantic blocks transmission with perception of images and channel conditions.
In order to achieve the tradeoff between transmission latency and classification accuracy, deep reinforcement learning is used.
arXiv Detail & Related papers (2021-12-21T02:44:49Z) - Self-supervised learning for joint SAR and multispectral land cover
classification [38.8529535887097]
We present a framework and specific tasks for self-supervised training of multichannel models.
We show that the proposed self-supervised approach is highly effective at learning features that correlate with the labels for land cover classification.
arXiv Detail & Related papers (2021-08-20T09:02:07Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Fine-Grained Visual Classification via Progressive Multi-Granularity
Training of Jigsaw Patches [67.51747235117]
Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks.
Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts.
We propose a novel framework for fine-grained visual classification to tackle these problems.
arXiv Detail & Related papers (2020-03-08T19:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.