Related papers: FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation

FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation

URL: http://arxiv.org/abs/2510.02114v1
Date: Thu, 02 Oct 2025 15:21:49 GMT
Title: FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation
Authors: Ding-Ruei Shen,
Abstract summary: Federeated Learning (FL) offers a privacy-preserving solution for Semantic (SS) tasks to adapt to new domains.<n>Most existing FL methods assume access to labeled data on remote clients or fail to leverage the power of modern Vision Foundation Models (VFMs)<n>Here, we propose a novel and challenging task, FFREEDG, in which a model is pretrained on a server's labeled source dataset and subsequently trained across clients using only their unlabeled data.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federeated Learning (FL) offers a privacy-preserving solution for Semantic Segmentation (SS) tasks to adapt to new domains, but faces significant challenges from these domain shifts, particularly when client data is unlabeled. However, most existing FL methods unrealistically assume access to labeled data on remote clients or fail to leverage the power of modern Vision Foundation Models (VFMs). Here, we propose a novel and challenging task, FFREEDG, in which a model is pretrained on a server's labeled source dataset and subsequently trained across clients using only their unlabeled data, without ever re-accessing the source. To solve FFREEDG, we propose FRIEREN, a framework that leverages the knowledge of a VFM by integrating vision and language modalities. Our approach employs a Vision-Language decoder guided by CLIP-based text embeddings to improve semantic disambiguation and uses a weak-to-strong consistency learning strategy for robust local training on pseudo-labels. Our experiments on synthetic-to-real and clear-to-adverse-weather benchmarks demonstrate that our framework effectively tackles this new task, achieving competitive performance against established domain generalization and adaptation methods and setting a strong baseline for future research.

Related papers

PANER: A Paraphrase-Augmented Framework for Low-Resource Named Entity Recognition [9.164874578520722]
We present a lightweight few-shot NER framework that combines principles from prior IT approaches to leverage the large context window of recent state-of-the-art LLMs.<n> Experiments on benchmark datasets show that our method achieves performance comparable to state-of-the-art models on few-shot and zero-shot tasks.
arXiv Detail & Related papers (2025-10-20T16:36:18Z)
An Empirical Study of Federated Prompt Learning for Vision Language Model [89.2963764404892]
This paper systematically investigates the behavioral differences between language prompt learning (VPT) and vision prompt learning (VLM)<n>We evaluate the impact of various FL and prompt configurations, such as client scale, aggregation strategies, and prompt length, to assess the robustness of Federated Prompt Learning (FPL)
arXiv Detail & Related papers (2025-05-29T03:09:15Z)
VLMs meet UDA: Boosting Transferability of Open Vocabulary Segmentation with Unsupervised Domain Adaptation [3.776249047528669]
This paper proposes enhancing segmentation accuracy across diverse domains by integrating Vision-Language reasoning with key strategies for Unsupervised Domain Adaptation (UDA)<n>We improve the fine-grained segmentation capabilities of VLMs through multi-scale contextual data, robust text embeddings with prompt augmentation, and layer-wise fine-tuning in our proposed Foundational-Retaining Open Vocabulary (FROVSS) framework.<n>The resulting UDA-FROV framework is the first UDA approach to effectively adapt across domains without requiring shared categories.
arXiv Detail & Related papers (2024-12-12T12:49:42Z)
Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning [7.2523602603838535]
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target domain using only unlabeled target data.<n>We propose $textbfReliability-based Curriculum Learning (RCL)$, a novel framework that integrates multiple MLLMs for knowledge exploitation via pseudo-labeling in SFDA.<n>RCL achieves state-of-the-art (SOTA) performance on multiple SFDA benchmarks, e.g., $textbf+9.4%$ on DomainNet, demonstrating its effectiveness in enhancing adaptability and robustness without requiring access to source data.
arXiv Detail & Related papers (2024-05-28T17:18:17Z)
Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL) GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval. Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z)
FedEGG: Federated Learning with Explicit Global Guidance [90.04705121816185]
Federated Learning (FL) holds great potential for diverse applications owing to its privacy-preserving nature.<n>Existing methods help address these challenges via optimization-based client constraints, adaptive client selection, or the use of pre-trained models or synthetic data.<n>We present bftextFedEGG, a new FL algorithm that constructs a global guiding task using a well-defined, easy-to-converge learning task.
arXiv Detail & Related papers (2024-04-18T04:25:21Z)
3FM: Multi-modal Meta-learning for Federated Tasks [2.117841684082203]
We introduce a meta-learning framework specifically designed for multimodal federated tasks. Our approach is motivated by the need to enable federated models to robustly adapt when exposed to new modalities. We demonstrate that the proposed algorithm achieves better performance than the baseline on a subset of missing modality scenarios.
arXiv Detail & Related papers (2023-12-15T20:03:24Z)
The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation [56.61543110071199]
Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset. Previous approaches have attempted to address SFVUDA by leveraging self-supervision derived from the target data itself. We take an approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain a rich world prior surprisingly robust to domain-shift.
arXiv Detail & Related papers (2023-08-17T18:12:05Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Learning Across Domains and Devices: Style-Driven Source-Free Domain Adaptation in Clustered Federated Learning [32.098954477227046]
We propose a novel task in which the clients' data is unlabeled and the server accesses a source labeled dataset for pre-training only. Our experiments show that our algorithm is able to efficiently tackle the new task outperforming existing approaches.
arXiv Detail & Related papers (2022-10-05T15:23:52Z)
Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS) It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image. We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.