Related papers: DRPT: Disentangled and Recurrent Prompt Tuning for Compositional Zero-Shot Learning

DRPT: Disentangled and Recurrent Prompt Tuning for Compositional Zero-Shot Learning

URL: http://arxiv.org/abs/2305.01239v1
Date: Tue, 2 May 2023 07:42:47 GMT
Title: DRPT: Disentangled and Recurrent Prompt Tuning for Compositional Zero-Shot Learning
Authors: Xiaocheng Lu, Ziming Liu, Song Guo, Jingcai Guo, Fushuo Huo, Sikai Bai and Tao Han
Abstract summary: State and object primitives are deemed as learnable tokens of vocabulary embedded in prompts and tuned on seen compositions. We develop a progressive fine-tuning procedure that allows for incremental updates to the prompts. We quantify and analyze the entanglement in Compositional Zero-shot Learning.
Score: 15.580557941267095
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compositional Zero-shot Learning (CZSL) aims to recognize novel concepts composed of known knowledge without training samples. Standard CZSL either identifies visual primitives or enhances unseen composed entities, and as a result, entanglement between state and object primitives cannot be fully utilized. Admittedly, vision-language models (VLMs) could naturally cope with CZSL through tuning prompts, while uneven entanglement leads prompts to be dragged into local optimum. In this paper, we take a further step to introduce a novel Disentangled and Recurrent Prompt Tuning framework termed DRPT to better tap the potential of VLMs in CZSL. Specifically, the state and object primitives are deemed as learnable tokens of vocabulary embedded in prompts and tuned on seen compositions. Instead of jointly tuning state and object, we devise a disentangled and recurrent tuning strategy to suppress the traction force caused by entanglement and gradually optimize the token parameters, leading to a better prompt space. Notably, we develop a progressive fine-tuning procedure that allows for incremental updates to the prompts, optimizing the object first, then the state, and vice versa. Meanwhile, the optimization of state and object is independent, thus clearer features can be learned to further alleviate the issue of entangling misleading optimization. Moreover, we quantify and analyze the entanglement in CZSL and supplement entanglement rebalancing optimization schemes. DRPT surpasses representative state-of-the-art methods on extensive benchmark datasets, demonstrating superiority in both accuracy and efficiency.

Related papers

Look before Transcription: End-to-End SlideASR with Visually-Anchored Policy Optimization [28.984638316524464]
We propose Visually-Anchored Policy Optimization (VAPO) to control the model's reasoning process.<n>VAPO enforces a structured "Look before Transcription" procedure using a think>answer> format.<n>This reasoning process is optimized via reinforcement learning with four distinct rewards targeting format compliance, OCR accuracy, ASR quality, and visual anchoring consistency.
arXiv Detail & Related papers (2025-10-08T08:18:47Z)
Prompt and Parameter Co-Optimization for Large Language Models [35.72638351230096]
We introduce MetaTuner, a novel framework that jointly integrates prompt optimization and fine-tuning for Large Language Models (LLMs) training.<n>Our framework is optimized to discover the optimal combinations between the prompts and parameters.<n>Experiments across diverse benchmarks show that our method consistently outperforms the baselines.
arXiv Detail & Related papers (2025-09-29T03:38:25Z)
TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits [7.615431299673158]
Traditional black-box optimization achieves sampling efficiency but lacks circuit understanding.<n>We propose TopoSizing, an end-to-end framework that performs robust circuit understanding directly from raw netlists.
arXiv Detail & Related papers (2025-09-17T16:52:46Z)
OAT-Rephrase: Optimization-Aware Training Data Rephrasing for Zeroth-Order LLM Fine-Tuning [25.76983801886268]
This paper introduces OAT-Rephrase, an Optimization-Aware Training data rephrasing strategy.<n>We show that OAT-Rephrase consistently improves MeZO fine-tuning performance.<n>Our findings suggest that optimization-aware rephrasing serves as a reusable and low-overhead enhancement for zeroth-order tuning regimes.
arXiv Detail & Related papers (2025-06-10T02:53:04Z)
Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation [5.296260279593993]
Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks. We propose an optimal transport (OT)-guided prompt learning framework that mitigates forgetting by preserving the structural consistency of feature distributions. Our approach enforces joint constraints on both vision and text representations, ensuring a holistic feature alignment.
arXiv Detail & Related papers (2025-03-11T21:38:34Z)
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback [50.84142264245052]
This work introduces the Align-SLM framework to enhance the semantic understanding of textless Spoken Language Models (SLMs) Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO) We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation.
arXiv Detail & Related papers (2024-11-04T06:07:53Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
Preference Alignment Improves Language Model-Based TTS [76.70693823683091]
preference alignment algorithms adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content. With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores.
arXiv Detail & Related papers (2024-09-19T01:58:19Z)
Towards Explainable Evolution Strategies with Large Language Models [0.0]
This paper introduces an approach that integrates self-adaptive Evolution Strategies (ES) with Large Language Models (LLMs) By employing a self-adaptive ES equipped with a restart mechanism, we effectively navigate the challenging landscapes of benchmark functions. An LLM is then utilized to process these logs, generating concise, user-friendly summaries.
arXiv Detail & Related papers (2024-07-11T09:28:27Z)
Boosting Vision-Language Models with Transduction [12.281505126587048]
We present TransCLIP, a novel and computationally efficient transductive approach for vision-language models. TransCLIP is applicable as a plug-and-play module on top of popular inductive zero- and few-shot models.
arXiv Detail & Related papers (2024-06-03T23:09:30Z)
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models [19.005364038603204]
We introduce a novel fine-tuning paradigm named Self-Consistency Tuning (SC-Tune) SC-Tune features the synergistic learning of a cyclic describer-locator system. We demonstrate that SC-Tune significantly elevates performance across a spectrum of object-level vision-language benchmarks.
arXiv Detail & Related papers (2024-03-20T03:00:21Z)
Understanding Prompt Tuning for V-L Models Through the Lens of Neural Collapse [47.89674843370092]
We propose Neural-collapse-anchored Prompt Tuning (NPT), a novel method that learns prompts with text and image representations. NPT incorporates two regularization terms: language-modality collapse and multi-modality isomorphism; and it is compatible with other prompt tuning methods.
arXiv Detail & Related papers (2023-06-28T06:37:03Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework. It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way. GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z)
Improving Self-Supervised Learning by Characterizing Idealized Representations [155.1457170539049]
We prove necessary and sufficient conditions for any task invariant to given data augmentations. For contrastive learning, our framework prescribes simple but significant improvements to previous methods. For non-contrastive learning, we use our framework to derive a simple and novel objective.
arXiv Detail & Related papers (2022-09-13T18:01:03Z)
Generalized Zero-Shot Learning via VAE-Conditioned Generative Flow [83.27681781274406]
Generalized zero-shot learning aims to recognize both seen and unseen classes by transferring knowledge from semantic descriptions to visual representations. Recent generative methods formulate GZSL as a missing data problem, which mainly adopts GANs or VAEs to generate visual features for unseen classes. We propose a conditional version of generative flows for GZSL, i.e., VAE-Conditioned Generative Flow (VAE-cFlow)
arXiv Detail & Related papers (2020-09-01T09:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.