Fugu-MT 論文翻訳(概要): EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

論文の概要: EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

arxiv url: http://arxiv.org/abs/2511.00956v1
Date: Sun, 02 Nov 2025 14:32:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:27.004977
Title: EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
Title（参考訳）: EVTAR: 未完成のビジュアル参照を追加してエンドツーエンドで試す
Authors: Liuzhuozheng Li, Yue Gong, Shanyuan Liu, Bo Cheng, Yuhang Ma, Liebucha Wu, Dengyang Jiang, Zanyi Wang, Dawei Leng, Yuhui Yin,
Abstract要約: 本稿では,付加参照を用いたエンド・ツー・エンド仮想試行モデルEVTARを提案する。我々のモデルはマスクや密着、セグメンテーションマップを使わずに試行結果を生成する。我々は、これらの機能をサポートするために、補足的参照と不自由な人物画像でトレーニングデータを豊かにします。
参考スコア（独自算出の注目度）: 16.702488896886845
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance try-on accuracy. Most existing virtual try-on approaches rely on complex inputs such as agnostic person images, human pose, densepose, or body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTAR adopts a two-stage training strategy, enabling simple inference with only the source image and the target garment inputs. Our model generates try-on results without masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional reference images of different individuals wearing the same clothes to preserve garment texture and fine-grained details better. This mechanism is analogous to how humans consider reference models when choosing outfits, thereby simulating a more realistic and high-quality dressing effect. We enrich the training data with supplementary references and unpaired person images to support these capabilities. We evaluate EVTAR on two widely used benchmarks and diverse tasks, and the results consistently validate the effectiveness of our approach.
Abstract（参考訳）: 提案するEVTAR(End-to-End Virtual Try-on model with Add Reference)は,対象の衣服を直接人物画像に適合させ,参照画像を取り入れて試着精度を向上させる。既存のバーチャル・トライオンのアプローチのほとんどは、無知な人像、人間のポーズ、密集、身体のキーポイントといった複雑な入力に依存しており、現実の応用には労働集約的で実用的ではない。対照的にEVTARは2段階のトレーニング戦略を採用しており、ソース画像とターゲットの衣服入力のみで単純な推論を可能にする。我々のモデルはマスクや密着、セグメンテーションマップを使わずに試行結果を生成する。さらにEVTARは、同じ服を着ている異なる個人の参照画像を利用して、衣服のテクスチャときめ細かいディテールをよりよく保存する。このメカニズムは、人間が衣装を選ぶ際に参照モデルを考える方法と類似しており、それによってより現実的で高品質なドレッシング効果をシミュレートする。我々は、これらの機能をサポートするために、補足的参照と不自由な人物画像でトレーニングデータを豊かにします。 EVTARを2つの広く使用されているベンチマークと多種多様なタスクで評価し,提案手法の有効性を一貫して検証した。

論文の概要: EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

関連論文リスト