Prompt-based Visual Alignment for Zero-shot Policy Transfer
- URL: http://arxiv.org/abs/2406.03250v1
- Date: Wed, 5 Jun 2024 13:26:30 GMT
- Title: Prompt-based Visual Alignment for Zero-shot Policy Transfer
- Authors: Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, QiCheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen,
- Abstract summary: Overfitting in reinforcement learning has become one of the main obstacles to applications in reinforcement learning.
We propose prompt-based visual alignment (PVA) to mitigate the detrimental domain bias in the image for zero-shot policy transfer.
We verify PVA on a vision-based autonomous driving task with CARLA simulator.
- Score: 35.784936617675896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issues, in this work, we propose prompt-based visual alignment (PVA), a robust framework to mitigate the detrimental domain bias in the image for zero-shot policy transfer. Inspired that Visual-Language Model (VLM) can serve as a bridge to connect both text space and image space, we leverage the semantic information contained in a text sequence as an explicit constraint to train a visual aligner. Thus, the visual aligner can map images from multiple domains to a unified domain and achieve good generalization performance. To better depict semantic information, prompt tuning is applied to learn a sequence of learnable tokens. With explicit constraints of semantic information, PVA can learn unified cross-domain representation under limited access to cross-domain data and achieves great zero-shot generalization ability in unseen domains. We verify PVA on a vision-based autonomous driving task with CARLA simulator. Experiments show that the agent generalizes well on unseen domains under limited access to multi-domain data.
Related papers
- WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation.
We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart.
We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z) - Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL)
In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain.
We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z) - Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms.
We propose a textbfDomain-Controlled Prompt Learning for the specific domains.
Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z) - Using Language to Extend to Unseen Domains [81.37175826824625]
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
We consider how simply verbalizing the training domain as well as domains we want to extend to but do not have data for can improve robustness.
Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain.
arXiv Detail & Related papers (2022-10-18T01:14:02Z) - Feature Representation Learning for Unsupervised Cross-domain Image
Retrieval [73.3152060987961]
Current supervised cross-domain image retrieval methods can achieve excellent performance.
The cost of data collection and labeling imposes an intractable barrier to practical deployment in real applications.
We introduce a new cluster-wise contrastive learning mechanism to help extract class semantic-aware features.
arXiv Detail & Related papers (2022-07-20T07:52:14Z) - TridentAdapt: Learning Domain-invariance via Source-Target Confrontation
and Self-induced Cross-domain Augmentation [0.0]
Key challenge is to learn domain-agnostic representation of the inputs in order to benefit from virtual data.
We propose a novel trident-like architecture that enforces a shared feature encoder to satisfy confrontational source and target constraints simultaneously.
We also introduce a novel training pipeline enabling self-induced cross-domain data augmentation during the forward pass.
arXiv Detail & Related papers (2021-11-30T11:25:46Z) - SPCL: A New Framework for Domain Adaptive Semantic Segmentation via
Semantic Prototype-based Contrastive Learning [6.705297811617307]
Domain adaptation can help in transferring knowledge from a labeled source domain to an unlabeled target domain.
We propose a novel semantic prototype-based contrastive learning framework for fine-grained class alignment.
Our method is easy to implement and attains superior results compared to state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-24T09:26:07Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - Variational Interaction Information Maximization for Cross-domain
Disentanglement [34.08140408283391]
Cross-domain disentanglement is the problem of learning representations partitioned into domain-invariant and domain-specific representations.
We cast the simultaneous learning of domain-invariant and domain-specific representations as a joint objective of multiple information constraints.
We show that our model achieves the state-of-the-art performance in the zero-shot sketch based image retrieval task.
arXiv Detail & Related papers (2020-12-08T07:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.