Federated Cross-Modal Style-Aware Prompt Generation
- URL: http://arxiv.org/abs/2508.12399v1
- Date: Sun, 17 Aug 2025 15:23:45 GMT
- Title: Federated Cross-Modal Style-Aware Prompt Generation
- Authors: Suraj Prasad, Navyansh Mahla, Sunny Gupta, Amit Sethi,
- Abstract summary: FedCSAP produces context-aware prompt tokens that are both distinct and non-redundant.<n>Our framework harnesses low, mid, and high-level features from CLIP's vision encoder alongside client-specific style indicators.
- Score: 2.4472081831862655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt learning has propelled vision-language models like CLIP to excel in diverse tasks, making them ideal for federated learning due to computational efficiency. However, conventional approaches that rely solely on final-layer features miss out on rich multi-scale visual cues and domain-specific style variations in decentralized client data. To bridge this gap, we introduce FedCSAP (Federated Cross-Modal Style-Aware Prompt Generation). Our framework harnesses low, mid, and high-level features from CLIP's vision encoder alongside client-specific style indicators derived from batch-level statistics. By merging intricate visual details with textual context, FedCSAP produces robust, context-aware prompt tokens that are both distinct and non-redundant, thereby boosting generalization across seen and unseen classes. Operating within a federated learning paradigm, our approach ensures data privacy through local training and global aggregation, adeptly handling non-IID class distributions and diverse domain-specific styles. Comprehensive experiments on multiple image classification datasets confirm that FedCSAP outperforms existing federated prompt learning methods in both accuracy and overall generalization.
Related papers
- Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition [55.189113121465816]
We propose a novel correlation adaptation prompt network (CAPNET) for long-tailed multi-label visual recognition.<n>CAPNET explicitly models correlations from CLIP's textual encoder.<n>It improves generalization through test-time ensembling and realigns visual-textual modalities.
arXiv Detail & Related papers (2025-11-25T18:57:28Z) - FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts [31.907894865146385]
FedMGP is a new paradigm for personalized federated prompt learning in vision-language models.<n>A diversity loss is introduced to drive each prompt group to specialize in distinct and complementary semantic aspects.<n>FedMGP consistently outperforms prior approaches in both personalization and domain generalization.
arXiv Detail & Related papers (2025-11-01T10:15:04Z) - FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models [97.35577473867296]
Federated Adversarial Prompt Tuning (textbfFedAPT) is a novel method designed to enhance the adversarial robustness of FPT.<n>To address this issue, we propose a textbfclass-aware prompt generator that generates visual prompts from text prompts.<n>Experiments on multiple image classification datasets demonstrate the superiority of FedAPT in improving adversarial robustness.
arXiv Detail & Related papers (2025-09-03T03:46:35Z) - Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion [44.8670376715096]
Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data.<n>We propose a personalized FL framework based on dual-prompt learning and cross fusion, termed pFedDC.
arXiv Detail & Related papers (2025-06-26T10:59:14Z) - FedSC: Federated Learning with Semantic-Aware Collaboration [12.366529890744822]
Federated learning (FL) aims to train models collaboratively across clients without sharing data for privacy-preserving.<n>We propose Federated Learning with Semantic-Aware Collaboration (FedSC) to capture client-specific and class-relevant knowledge across heterogeneous clients.
arXiv Detail & Related papers (2025-06-26T05:04:55Z) - FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models [23.830133838392964]
We propose FedRSCLIP, the first federated learning framework for remote sensing image classification based on a VLM, specifically CLIP.<n>FedRSCLIP addresses the challenges of data heterogeneity and large-scale model transmission in federated environments by introducing Prompt Learning.<n>To validate the effectiveness of our proposed model, we construct a Fed-RSIC dataset based on three existing remote sensing image classification datasets.
arXiv Detail & Related papers (2025-01-05T07:10:27Z) - Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding [90.74967596080982]
This paper extends Contrastive Language-Image Pre-training (CLIP) with multi-granularity alignment.
We develop a Unified Multi-Granularity learning framework, termed UMG-CLIP, which simultaneously empowers the model with versatile perception abilities.
With parameter efficient tuning, UMG-CLIP surpasses current widely used CLIP variants and achieves state-of-the-art performance on diverse image understanding benchmarks.
arXiv Detail & Related papers (2024-01-12T06:35:09Z) - CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts [11.752632557524969]
We propose contrastive learning with data augmentation to disentangle content features from the original representations.<n>Our experiments across diverse datasets demonstrate significant improvements in zero-shot and few-shot classification tasks.
arXiv Detail & Related papers (2023-11-28T03:00:59Z) - Personalized Federated Learning via Amortized Bayesian Meta-Learning [21.126405589760367]
We introduce a new perspective on personalized federated learning through Amortized Bayesian Meta-Learning.
Specifically, we propose a novel algorithm called emphFedABML, which employs hierarchical variational inference across clients.
Our theoretical analysis provides an upper bound on the average generalization error and guarantees the generalization performance on unseen data.
arXiv Detail & Related papers (2023-07-05T11:58:58Z) - Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.
Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation.
This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z) - DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning [83.48587570246231]
Visual Similarity plays an important role in many computer vision applications.
Deep metric learning (DML) is a powerful framework for learning such similarities.
We propose and study multiple complementary learning tasks, targeting conceptually different data relationships.
We learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance.
arXiv Detail & Related papers (2020-04-28T12:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.