Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
- URL: http://arxiv.org/abs/2602.07081v1
- Date: Fri, 06 Feb 2026 03:53:35 GMT
- Title: Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
- Authors: Thu Hang Phung, Duong M. Nguyen, Thanh Trung Huynh, Quoc Viet Hung Nguyen, Trong Nghia Hoang, Phi Le Nguyen,
- Abstract summary: This paper bridges the gap between federated learning and multi-modal prompt-tuning.<n>A key challenge in this setting arises from the lack of semantic alignment between prompt instructions.<n>Our framework introduces specialized client-tuning and server-aggregation designs that simultaneously optimize, align, and aggregate prompt-tuning instructions.
- Score: 22.933465523798475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a generalized federated prompt-tuning framework for practical scenarios where local datasets are multi-modal and exhibit different distributional patterns of missing features at the input level. The proposed framework bridges the gap between federated learning and multi-modal prompt-tuning which have traditionally focused on either uni-modal or centralized data. A key challenge in this setting arises from the lack of semantic alignment between prompt instructions that encode similar distributional patterns of missing data across different clients. To address this, our framework introduces specialized client-tuning and server-aggregation designs that simultaneously optimize, align, and aggregate prompt-tuning instructions across clients and data modalities. This allows prompt instructions to complement one another and be combined effectively. Extensive evaluations on diverse multimodal benchmark datasets demonstrate that our work consistently outperforms state-of-the-art (SOTA) baselines.
Related papers
- UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting [90.47915032778366]
We propose UniDiff, a unified diffusion framework for multimodal time series forecasting.<n>At its core lies a unified and parallel fusion module, where a single cross-attention mechanism integrates structural information from timestamps and semantic context from texts.<n>Experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-08T05:36:14Z) - Learning Reconfigurable Representations for Multimodal Federated Learning with Missing Data [24.083180898232417]
We propose a new federated learning framework featuring locally adaptive representations based on learnable client-side embedding controls.<n>These embeddings serve as reconfiguration signals that align the globally aggregated representation with each client's local context.<n>The proposed method achieves up to 36.45% performance improvement under severe data incompleteness.
arXiv Detail & Related papers (2025-10-27T00:09:58Z) - Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence [83.15764564701706]
We propose a novel framework that performs vision-language alignment by integrating Cauchy-Schwarz divergence with mutual information.<n>We find that the CS divergence seamlessly addresses the InfoNCE's alignment-uniformity conflict and serves complementary roles with InfoNCE.<n> Experiments on text-to-image generation and cross-modality retrieval tasks demonstrate the effectiveness of our method on vision-language alignment.
arXiv Detail & Related papers (2025-02-24T10:29:15Z) - FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models [23.830133838392964]
We propose FedRSCLIP, the first federated learning framework for remote sensing image classification based on a VLM, specifically CLIP.<n>FedRSCLIP addresses the challenges of data heterogeneity and large-scale model transmission in federated environments by introducing Prompt Learning.<n>To validate the effectiveness of our proposed model, we construct a Fed-RSIC dataset based on three existing remote sensing image classification datasets.
arXiv Detail & Related papers (2025-01-05T07:10:27Z) - Towards Personalized Federated Multi-Scenario Multi-Task Recommendation [22.095138650857436]
PF-MSMTrec is a novel framework for personalized federated multi-scenario multi-task recommendation.
We introduce a bottom-up joint learning mechanism to address the unique challenges of multiple optimization conflicts.
Our proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-27T07:10:37Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Decoupled Subgraph Federated Learning [57.588938805581044]
We address the challenge of federated learning on graph-structured data distributed across multiple clients.<n>We present a novel framework for this scenario, named FedStruct, that harnesses deep structural dependencies.<n>We validate the effectiveness of FedStruct through experimental results conducted on six datasets for semi-supervised node classification.
arXiv Detail & Related papers (2024-02-29T13:47:23Z) - 3FM: Multi-modal Meta-learning for Federated Tasks [2.117841684082203]
We introduce a meta-learning framework specifically designed for multimodal federated tasks.
Our approach is motivated by the need to enable federated models to robustly adapt when exposed to new modalities.
We demonstrate that the proposed algorithm achieves better performance than the baseline on a subset of missing modality scenarios.
arXiv Detail & Related papers (2023-12-15T20:03:24Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Preserving Modality Structure Improves Multi-Modal Learning [64.10085674834252]
Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings without relying on human annotations.
These methods often struggle to generalize well on out-of-domain data as they ignore the semantic structure present in modality-specific embeddings.
We propose a novel Semantic-Structure-Preserving Consistency approach to improve generalizability by preserving the modality-specific relationships in the joint embedding space.
arXiv Detail & Related papers (2023-08-24T20:46:48Z) - Multimodal Federated Learning via Contrastive Representation Ensemble [17.08211358391482]
Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning.
Existing FL methods all rely on model aggregation on single modality level.
We propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)
arXiv Detail & Related papers (2023-02-17T14:17:44Z) - Support-set based Multi-modal Representation Enhancement for Video
Captioning [121.70886789958799]
We propose a Support-set based Multi-modal Representation Enhancement (SMRE) model to mine rich information in a semantic subspace shared between samples.
Specifically, we propose a Support-set Construction (SC) module to construct a support-set to learn underlying connections between samples and obtain semantic-related visual elements.
During this process, we design a Semantic Space Transformation (SST) module to constrain relative distance and administrate multi-modal interactions in a self-supervised way.
arXiv Detail & Related papers (2022-05-19T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.