Federated Vision-Language-Recommendation with Personalized Fusion
- URL: http://arxiv.org/abs/2410.08478v4
- Date: Sun, 02 Nov 2025 00:09:30 GMT
- Title: Federated Vision-Language-Recommendation with Personalized Fusion
- Authors: Zhiwei Li, Guodong Long, Jing Jiang, Chengqi Zhang, Qiang Yang,
- Abstract summary: This paper introduces FedVLR, a federated VLR framework specially designed for user-specific personalized fusion of vision-language representations.<n>The effectiveness of our proposed FedVLR has been validated on seven benchmark datasets.
- Score: 48.25209840295838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applying large pre-trained Vision-Language Models to recommendation is a burgeoning field, a direction we term Vision-Language-Recommendation (VLR). Bringing VLR to user-oriented on-device intelligence within a federated learning framework is a crucial step for enhancing user privacy and delivering personalized experiences. This paper introduces FedVLR, a federated VLR framework specially designed for user-specific personalized fusion of vision-language representations. At its core is a novel bi-level fusion mechanism: The server-side multi-view fusion module first generates a diverse set of pre-fused multimodal views. Subsequently, each client employs a user-specific mixture-of-expert mechanism to adaptively integrate these views based on individual user interaction history. This designed lightweight personalized fusion module provides an efficient solution to implement a federated VLR system. The effectiveness of our proposed FedVLR has been validated on seven benchmark datasets.
Related papers
- Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models [12.270878920401948]
pFedMMA is the first personalized federated learning framework that leverages multi-modal adapters for vision-language tasks.<n>We show that pFedMMA achieves state-of-the-art trade-offs between personalization and generalization, outperforming recent federated prompt tuning methods.
arXiv Detail & Related papers (2025-07-07T18:26:34Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion [59.54067771781552]
We propose a framework named MMFeD3-HidE for addressing multimodal uncertain unavailability and multimodal client heterogeneity challenges of FedMKGC.<n>We propose a FedMKGC benchmark for a comprehensive evaluation, consisting of a general FedMKGC backbone named MMFedE, datasets with heterogeneous multimodal information, and three groups of constructed baselines.
arXiv Detail & Related papers (2025-06-27T09:32:58Z) - BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation [55.486872677160015]
We reformulate multi-modal semantic segmentation as a mask-level classification task.<n>We propose BiXFormer, which integrates Unified Modality Matching (UMM) and Cross Modality Alignment (CMA)<n> Experiments on both synthetic and real-world multi-modal benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-06-04T08:04:58Z) - MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA.<n>MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step.<n> Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2025-03-24T16:32:17Z) - Enhancing User Intent for Recommendation Systems via Large Language Models [0.0]
DUIP is a novel framework that combines LSTM networks with Large Language Models (LLMs) to dynamically capture user intent and generate personalized item recommendations.<n>Our findings suggest that DUIP is a promising approach for next-generation recommendation systems, with potential for further improvements in cross-modal recommendations and scalability.
arXiv Detail & Related papers (2025-01-18T20:35:03Z) - Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation [4.518104756199573]
Molar is a sequential recommendation framework that integrates multiple content modalities with ID information to capture collaborative signals effectively.
By seamlessly combining multimodal content with collaborative filtering insights, Molar captures both user interests and contextual semantics, leading to superior recommendation accuracy.
arXiv Detail & Related papers (2024-12-24T05:23:13Z) - MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA.<n>MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step.<n> Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2024-11-18T16:33:52Z) - FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation [22.281467168796645]
Federated learning (FL) is a collaborative machine learning approach that enables multiple clients to train models without sharing their private data.<n>We propose FedMoE-DA, a new FL model training framework that incorporates a novel domain-aware, fine-grained aggregation strategy to enhance the robustness, personalizability, and communication efficiency simultaneously.
arXiv Detail & Related papers (2024-11-04T14:29:04Z) - Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding [51.96911650437978]
Multi-modal fusion has played a vital role in multi-modal scene understanding.
Most existing methods focus on cross-modal fusion involving two modalities, often overlooking more complex multi-modal fusion.
We propose a relational Part-Whole Fusion (PWRF) framework for multi-modal scene understanding.
arXiv Detail & Related papers (2024-10-19T02:27:30Z) - FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts [4.412721048192925]
We present FedMoE, the efficient personalized Federated Learning framework to address data heterogeneity.
FedMoE is composed of two fine-tuning stages. In the first stage, FedMoE simplifies the problem by conducting a search based on observed activation patterns.
In the second stage, these submodels are distributed to clients for further training and returned for server aggregating.
arXiv Detail & Related papers (2024-08-21T03:16:12Z) - Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach [49.63614966954833]
Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy.
This paper proposes a novel personalized FedCF method by preserving users' personalized information into a latent variable and a neural model simultaneously.
To effectively train the proposed framework, we model the problem as a specialized Variational AutoEncoder (VAE) task by integrating user interaction vector reconstruction with missing value prediction.
arXiv Detail & Related papers (2024-08-16T05:49:14Z) - Towards Personalized Federated Multi-Scenario Multi-Task Recommendation [22.095138650857436]
PF-MSMTrec is a novel framework for personalized federated multi-scenario multi-task recommendation.
We introduce a bottom-up joint learning mechanism to address the unique challenges of multiple optimization conflicts.
Our proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-27T07:10:37Z) - MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation.
We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information.
We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z) - PMG : Personalized Multimodal Generation with Large Language Models [20.778869086174137]
This paper proposes the first method for personalized multimodal generation using large language models (LLMs)
It showcases its applications and validates its performance via an extensive experimental study on two datasets.
PMG has a significant improvement on personalization for up to 8% in terms of LPIPS while retaining the accuracy of generation.
arXiv Detail & Related papers (2024-04-07T03:05:57Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment [39.54689489555342]
Current vision-injected (VL) tracking framework consists of three parts, ie a visual feature extractor, a language feature extractor, and a fusion model.<n>We propose an All-in-One framework, which learns joint feature extraction and interaction by adopting a unified transformer backbone.
arXiv Detail & Related papers (2023-07-07T03:51:21Z) - Dual Personalization on Federated Recommendation [50.4115315992418]
Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings.
This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models.
We also propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items.
arXiv Detail & Related papers (2023-01-16T05:26:07Z) - Diversely Regularized Matrix Factorization for Accurate and Aggregately
Diversified Recommendation [15.483426620593013]
DivMF (Diversely Regularized Matrix Factorization) is a novel matrix factorization method for aggregately diversified recommendation.
We show that DivMF achieves the state-of-the-art performance in aggregately diversified recommendation.
arXiv Detail & Related papers (2022-10-19T08:49:39Z) - FedSPLIT: One-Shot Federated Recommendation System Based on Non-negative
Joint Matrix Factorization and Knowledge Distillation [7.621960305708476]
We present the first unsupervised one-shot federated CF implementation, named FedSPLIT, based on NMF joint factorization.
FedSPLIT can obtain similar results than the state of the art (and even outperform it in certain situations) with a substantial decrease in the number of communications.
arXiv Detail & Related papers (2022-05-04T23:42:14Z) - Federated Multi-view Matrix Factorization for Personalized
Recommendations [53.74747022749739]
We introduce the federated multi-view matrix factorization method that extends the federated learning framework to matrix factorization with multiple data sources.
Our method is able to learn the multi-view model without transferring the user's personal data to a central server.
arXiv Detail & Related papers (2020-04-08T21:07:50Z) - Meta Matrix Factorization for Federated Rating Predictions [84.69112252208468]
Federated recommender systems have distinct advantages in terms of privacy protection over traditional recommender systems.
Previous work on federated recommender systems does not fully consider the limitations of storage, RAM, energy and communication bandwidth in a mobile environment.
Our goal in this paper is to design a novel federated learning framework for rating prediction (RP) for mobile environments.
arXiv Detail & Related papers (2019-10-22T16:29:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.