Fugu-MT 論文翻訳(概要): FlowComposer: Composable Flows for Compositional Zero-Shot Learning

論文の概要: FlowComposer: Composable Flows for Compositional Zero-Shot Learning

arxiv url: http://arxiv.org/abs/2603.16641v1
Date: Tue, 17 Mar 2026 15:12:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.359639
Title: FlowComposer: Composable Flows for Compositional Zero-Shot Learning
Title（参考訳）: FlowComposer: 合成ゼロショット学習のための構成可能なフロー
Authors: Zhenqi He, Lin Li, Long Chen,
Abstract要約: 合成ゼロショット学習(CZSL)は、見知らぬペアから学んだプリミティブを再結合することによって、見つからない属性オブジェクトの合成を認識することを目的としている。視覚言語モデル(VLM)に基づく最近のCZSL法は、一般的にパラメータ効率の細かいチューニング(PEFT)を採用する。 FlowComposerは2つの原始的なフローを学習し、属性やオブジェクトテキストの埋め込みに対して視覚的特徴を伝達するモデルに依存しないフレームワークである。
参考スコア（独自算出の注目度）: 10.977642730831361
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by recombining primitives learned from seen pairs. Recent CZSL methods built on vision-language models (VLMs) typically adopt parameter-efficient fine-tuning (PEFT). They apply visual disentanglers for decomposition and manipulate token-level prompts or prefixes to encode compositions. However, such PEFT-based designs suffer from two fundamental limitations: (1) Implicit Composition Construction, where composition is realized only via token concatenation or branch-wise prompt tuning rather than an explicit operation in the embedding space; (2) Remained Feature Entanglement, where imperfect disentanglement leaves attribute, object, and composition features mutually contaminated. Together, these issues limit the generalization ability of current CZSL models. In this paper, we are the first to systematically study flow matching for CZSL and introduce FlowComposer, a model-agnostic framework that learns two primitive flows to transport visual features toward attribute and object text embeddings, and a learnable Composer that explicitly fuses their velocity fields into a composition flow. To exploit the inevitable residual entanglement, we further devise a leakage-guided augmentation scheme that reuses leaked features as auxiliary signals. We thoroughly evaluate FlowComposer on three public CZSL benchmarks by integrating it as a plug-and-play component into various baselines, consistently achieving significant improvements.
Abstract（参考訳）: 合成ゼロショット学習(CZSL)は、見知らぬペアから学んだプリミティブを再結合することによって、見つからない属性オブジェクトの合成を認識することを目的としている。近年,視覚言語モデル(VLM)上に構築されているCZSL法では,パラメータ効率のよい微調整(PEFT)が一般的である。コンポジションをエンコードするためにトークンレベルのプロンプトやプレフィックスを操作する。しかし, PEFT をベースとした設計では,(1) トークンの連結や,(2) 組み込み空間における明示的な操作よりも,構成を限定的に実現したインシシシブ・コンポジション・コンストラクション・コンストラクション・コンストラクション・コンストラクション, (2) 不完全なアンタングルの属性, オブジェクト, コンストラクションが相互に汚染された残留特徴エンタングルメント, という2つの基本的制約が課されている。これらの問題により、現在のCZSLモデルの一般化能力は制限される。本稿では,CZSLのフローマッチングを体系的に研究し,モデルに依存しない2つのフローを学習して属性やオブジェクトテキストの埋め込みに向けて視覚的特徴を伝達するFlowComposerと,その速度場を合成フローに明示的に融合する学習可能なComperを紹介する。必然的に残される絡み合いを生かして,漏洩した特徴を補助信号として再利用する漏出誘導拡張方式をさらに考案する。プラグイン・アンド・プレイコンポーネントとして様々なベースラインに統合することで,3つの公開CZSLベンチマーク上でFlowComposerを徹底的に評価し,一貫した改善を実現した。

論文の概要: FlowComposer: Composable Flows for Compositional Zero-Shot Learning

関連論文リスト