Language-aware Domain Generalization Network for Cross-Scene
Hyperspectral Image Classification
- URL: http://arxiv.org/abs/2209.02700v1
- Date: Tue, 6 Sep 2022 10:06:10 GMT
- Title: Language-aware Domain Generalization Network for Cross-Scene
Hyperspectral Image Classification
- Authors: Yuxiang Zhang, Mengmeng Zhang, Wei Li, Shuai Wang and Ran Tao
- Abstract summary: It is necessary to explore the effectiveness of linguistic mode in assisting hyperspectral image classification.
Large-scale pre-training image-text foundation models have demonstrated great performance in a variety of downstream applications.
A Language-aware Domain Generalization Network (LDGnet) is proposed to learn cross-domain invariant representation.
- Score: 15.842081807249416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text information including extensive prior knowledge about land cover classes
has been ignored in hyperspectral image classification (HSI) tasks. It is
necessary to explore the effectiveness of linguistic mode in assisting HSI
classification. In addition, the large-scale pre-training image-text foundation
models have demonstrated great performance in a variety of downstream
applications, including zero-shot transfer. However, most domain generalization
methods have never addressed mining linguistic modal knowledge to improve the
generalization performance of model. To compensate for the inadequacies listed
above, a Language-aware Domain Generalization Network (LDGnet) is proposed to
learn cross-domain invariant representation from cross-domain shared prior
knowledge. The proposed method only trains on the source domain (SD) and then
transfers the model to the target domain (TD). The dual-stream architecture
including image encoder and text encoder is used to extract visual and
linguistic features, in which coarse-grained and fine-grained text
representations are designed to extract two levels of linguistic features.
Furthermore, linguistic features are used as cross-domain shared semantic
space, and visual-linguistic alignment is completed by supervised contrastive
learning in semantic space. Extensive experiments on three datasets demonstrate
the superiority of the proposed method when compared with state-of-the-art
techniques.
Related papers
- Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety.
Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z) - WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation.
We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart.
We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z) - Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation [44.008094698200026]
FreeDA is a training-free diffusion-augmented method for open-vocabulary semantic segmentation.
FreeDA achieves state-of-the-art performance on five datasets.
arXiv Detail & Related papers (2024-04-09T18:00:25Z) - VLLaVO: Mitigating Visual Gap through LLMs [7.352822795984628]
Cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data.
We propose VLLaVO, combining Vision language models and Large Language models as Visual cross-dOmain learners.
arXiv Detail & Related papers (2024-01-06T16:33:39Z) - TeG-DG: Textually Guided Domain Generalization for Face Anti-Spoofing [8.830873674673828]
Existing methods are dedicated to extracting domain-invariant features from various training domains.
The extracted features inevitably contain residual style feature bias, resulting in inferior generalization performance.
We propose the Textually Guided Domain Generalization framework, which can effectively leverage text information for cross-domain alignment.
arXiv Detail & Related papers (2023-11-30T10:13:46Z) - One-for-All: Towards Universal Domain Translation with a Single StyleGAN [86.33216867136639]
We propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains.
The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations.
UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks.
arXiv Detail & Related papers (2023-10-22T08:02:55Z) - CLIP the Gap: A Single Domain Generalization Approach for Object
Detection [60.20931827772482]
Single Domain Generalization tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain.
We propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts.
We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss.
arXiv Detail & Related papers (2023-01-13T12:01:18Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - Structured Latent Embeddings for Recognizing Unseen Classes in Unseen
Domains [108.11746235308046]
We propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains.
Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods.
arXiv Detail & Related papers (2021-07-12T17:57:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.