Prototypical Progressive Alignment and Reweighting for Generalizable Semantic Segmentation
- URL: http://arxiv.org/abs/2507.11955v1
- Date: Wed, 16 Jul 2025 06:42:21 GMT
- Title: Prototypical Progressive Alignment and Reweighting for Generalizable Semantic Segmentation
- Authors: Yuhang Zhang, Zhengyu Zhang, Muxin Liao, Shishun Tian, Wenbin Zou, Lu Zhang, Chen Xu,
- Abstract summary: Generalizable semantic segmentation aims to perform well on unseen target domains.<n>Class-wise prototypes serve as domain-invariant cues that benefit generalization due to their stability and semantic consistency.<n>We propose a novel framework for generalizable semantic segmentation: Prototypical Progressive Alignment and Reweighting.
- Score: 13.24093379138835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalizable semantic segmentation aims to perform well on unseen target domains, a critical challenge due to real-world applications requiring high generalizability. Class-wise prototypes, representing class centroids, serve as domain-invariant cues that benefit generalization due to their stability and semantic consistency. However, this approach faces three challenges. First, existing methods often adopt coarse prototypical alignment strategies, which may hinder performance. Second, naive prototypes computed by averaging source batch features are prone to overfitting and may be negatively affected by unrelated source data. Third, most methods treat all source samples equally, ignoring the fact that different features have varying adaptation difficulties. To address these limitations, we propose a novel framework for generalizable semantic segmentation: Prototypical Progressive Alignment and Reweighting (PPAR), leveraging the strong generalization ability of the CLIP model. Specifically, we define two prototypes: the Original Text Prototype (OTP) and Visual Text Prototype (VTP), generated via CLIP to serve as a solid base for alignment. We then introduce a progressive alignment strategy that aligns features in an easy-to-difficult manner, reducing domain gaps gradually. Furthermore, we propose a prototypical reweighting mechanism that estimates the reliability of source data and adjusts its contribution, mitigating the effect of irrelevant or harmful features (i.e., reducing negative transfer). We also provide a theoretical analysis showing the alignment between our method and domain generalization theory. Extensive experiments across multiple benchmarks demonstrate that PPAR achieves state-of-the-art performance, validating its effectiveness.
Related papers
- Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation [75.18058114915327]
Generalized Few-Shot Semanticnative (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples.<n>We propose FewCLIP, a probabilistic prototype calibration framework over multi-modal prototypes from the pretrained CLIP.<n>We show FewCLIP significantly outperforms state-of-the-art approaches across both GFSS and class-incremental setting.
arXiv Detail & Related papers (2025-06-28T18:36:22Z) - Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution.<n>Our contribution is to provide the first general estimation technique for transportability problems.<n>We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z) - FedORGP: Guiding Heterogeneous Federated Learning with Orthogonality Regularization on Global Prototypes [31.93057335216804]
Federated Learning (FL) has emerged as an essential framework for distributed machine learning.<n>Current approaches face limitations in achieving separation between classes.<n>This paper introduces FedtFLORG, which encourages intra-class prototype similarity and expands the inter-class angular separation.
arXiv Detail & Related papers (2025-02-22T07:02:51Z) - PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination [14.50214193838818]
A zero-shot generalization in vision-language (V-L) models such as CLIP has spurred their widespread adoption.
Previous methods have employed test-time prompt tuning to adapt the model to unseen domains, but they overlooked the issue of imbalanced class distributions.
In this study, we employ class-aware prototype alignment weighted by mean class probabilities obtained for a test sample and filtered augmented views.
arXiv Detail & Related papers (2024-04-11T07:26:00Z) - Prototypical Contrastive Learning through Alignment and Uniformity for
Recommendation [6.790779112538357]
We present underlinePrototypical contrastive learning through underlineAlignment and underlineUniformity for recommendation.
Specifically, we first propose prototypes as a latent space to ensure consistency across different augmentations from the origin graph.
The absence of explicit negatives means that directly optimizing the consistency loss between instance and prototype could easily result in dimensional collapse issues.
arXiv Detail & Related papers (2024-02-03T08:19:26Z) - A Robust Negative Learning Approach to Partial Domain Adaptation Using
Source Prototypes [0.8895157045883034]
This work proposes a robust Partial Domain Adaptation (PDA) framework that mitigates the negative transfer problem.
It includes diverse, complementary label feedback, alleviating the effect of incorrect feedback and promoting pseudo-label refinement.
We conducted a series of comprehensive experiments, including an ablation analysis, covering a range of partial domain adaptation tasks.
arXiv Detail & Related papers (2023-09-07T07:26:27Z) - Rethinking Prototypical Contrastive Learning through Alignment,
Uniformity and Correlation [24.794022951873156]
We propose to learn Prototypical representation through Alignment, Uniformity and Correlation (PAUC)
Specifically, the ordinary ProtoNCE loss is revised with: (1) an alignment loss that pulls embeddings from positive prototypes together; (2) a loss that distributes the prototypical level features uniformly; (3) a correlation loss that increases the diversity and discriminability between prototypical level features.
arXiv Detail & Related papers (2022-10-18T22:33:12Z) - BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy
for Source-free Domain Adaptation [74.93176783541332]
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data.
To make up for the absence of source data, most existing methods introduced feature prototype based pseudo-labeling strategies.
We propose a general class-Balanced Multicentric Dynamic prototype strategy for the SFDA task.
arXiv Detail & Related papers (2022-04-06T13:23:02Z) - Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation.
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.