Related papers: EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning

EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning

URL: http://arxiv.org/abs/2506.20986v1
Date: Thu, 26 Jun 2025 04:00:55 GMT
Title: EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning
Authors: Xiao Zhang, Yongqiang Ma, Haodong Jing, Nanning Zheng,
Abstract summary: We propose EVA, a Mixture-of-Experts Semantic Variant Alignment framework for Compositional Zero-Shot Learning (CZSL)<n>Specifically, we introduce domain-expert adaption, leveraging multiple experts to achieve token-aware learning and model high-quality primitive representations.<n>Our method significantly outperforms other state-of-the-art CZSL methods on three popular benchmarks in both closed- and open-world settings.
Score: 31.95599022275838
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compositional Zero-Shot Learning (CZSL) investigates compositional generalization capacity to recognize unknown state-object pairs based on learned primitive concepts. Existing CZSL methods typically derive primitives features through a simple composition-prototype mapping, which is suboptimal for a set of individuals that can be divided into distinct semantic subsets. Moreover, the all-to-one cross-modal primitives matching neglects compositional divergence within identical states or objects, limiting fine-grained image-composition alignment. In this study, we propose EVA, a Mixture-of-Experts Semantic Variant Alignment framework for CZSL. Specifically, we introduce domain-expert adaption, leveraging multiple experts to achieve token-aware learning and model high-quality primitive representations. To enable accurate compositional generalization, we further present semantic variant alignment to select semantically relevant representation for image-primitives matching. Our method significantly outperforms other state-of-the-art CZSL methods on three popular benchmarks in both closed- and open-world settings, demonstrating the efficacy of the proposed insight.

Related papers

Learning Clustering-based Prototypes for Compositional Zero-shot Learning [56.57299428499455]
ClusPro is a robust clustering-based prototype mining framework for Compositional Zero-Shot Learning.<n>It defines the conceptual boundaries of primitives through a set of diversified prototypes.<n>ClusPro efficiently performs prototype clustering in a non-parametric fashion without the introduction of additional learnable parameters.
arXiv Detail & Related papers (2025-02-10T14:20:01Z)
Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [54.08741382593959]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL)<n>It is challenging to learn disentangled primitive features that are general across different compositions.<n>We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs.
arXiv Detail & Related papers (2024-08-19T08:23:09Z)
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object) We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z)
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning [73.49852821602057]
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional visual concepts. We propose a model by prompting the language-informed distribution, aka., PLID, for the task. Experimental results on MIT-States, UT-Zappos, and C-GQA datasets show the superior performance of the PLID to the prior arts.
arXiv Detail & Related papers (2023-05-23T18:00:22Z)
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning [37.445883075993414]
Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs. We propose a novel paradigm for CZSL models that establishes three identification branches (i.e., Multi-Path) to jointly model the state, object, and composition. We conduct extensive experiments on three popular benchmarks, where our method significantly outperforms existing methods in both closed-world and open-world settings.
arXiv Detail & Related papers (2023-03-27T14:10:26Z)
ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning [29.591615811894265]
Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space. We propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks.
arXiv Detail & Related papers (2022-11-19T10:09:46Z)
Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning [86.5258816031722]
The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. In Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions.
arXiv Detail & Related papers (2022-11-05T12:57:06Z)
Learning Invariant Visual Representations for Compositional Zero-Shot Learning [30.472541551048508]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen-object compositions in the training set. We propose an invariant feature learning framework to align different domains at the representation and gradient levels. Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2022-06-01T11:33:33Z)
HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains. Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.