Fugu-MT 論文翻訳(概要): DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing

論文の概要: DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing

arxiv url: http://arxiv.org/abs/2510.04797v1
Date: Fri, 03 Oct 2025 16:27:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.878525
Title: DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing
Title（参考訳）: DiT-VTON:統合画像編集による仮想試行と仮想試行のための拡散変圧器フレームワーク
Authors: Qi Li, Shuwen Qiu, Julien Han, Xingzi Xu, Mehmet Saygin Seyfioglu, Kee Kiat Koo, Karim Bouyarmane,
Abstract要約: 拡散変換器(DiT)を利用した新しいVTOフレームワークであるDiT-VTONを提案する。我々のモデルは、VITON-HDの最先端手法を超越し、コンディションエンコーダに頼らずに、優れたディテール保存とロバスト性を実現する。また、VTAと画像編集機能により、何千もの製品カテゴリにまたがる多様なデータセットでモデルのパフォーマンスも向上する。
参考スコア（独自算出の注目度）: 11.550777201655393
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid growth of e-commerce has intensified the demand for Virtual Try-On (VTO) technologies, enabling customers to realistically visualize products overlaid on their own images. Despite recent advances, existing VTO models face challenges with fine-grained detail preservation, robustness to real-world imagery, efficient sampling, image editing capabilities, and generalization across diverse product categories. In this paper, we present DiT-VTON, a novel VTO framework that leverages a Diffusion Transformer (DiT), renowned for its performance on text-conditioned image generation, adapted here for the image-conditioned VTO task. We systematically explore multiple DiT configurations, including in-context token concatenation, channel concatenation, and ControlNet integration, to determine the best setup for VTO image conditioning. To enhance robustness, we train the model on an expanded dataset encompassing varied backgrounds, unstructured references, and non-garment categories, demonstrating the benefits of data scaling for VTO adaptability. DiT-VTON also redefines the VTO task beyond garment try-on, offering a versatile Virtual Try-All (VTA) solution capable of handling a wide range of product categories and supporting advanced image editing functionalities such as pose preservation, localized editing, texture transfer, and object-level customization. Experimental results show that our model surpasses state-of-the-art methods on VITON-HD, achieving superior detail preservation and robustness without reliance on additional condition encoders. It also outperforms models with VTA and image editing capabilities on a diverse dataset spanning thousands of product categories.
Abstract（参考訳）: 電子商取引の急速な成長により、仮想トライオン(VTO)技術への需要が増大し、顧客は自分のイメージにオーバーレイされた製品を現実的に視覚化することができるようになった。近年の進歩にもかかわらず、既存のVTOモデルは細かなディテールの保存、現実世界の画像への堅牢性、効率的なサンプリング、画像編集機能、さまざまな製品カテゴリにわたる一般化といった課題に直面している。本稿では,Diffusion Transformer (DiT) を利用した新しいVTOフレームワークであるDiT-VTONについて述べる。本稿では,VTO画像コンディショニングの最適設定を決定するために,コンテクスト内トークンの連結,チャネルの連結,コントロールネットの統合など,複数のDiT構成を体系的に検討する。頑健性を高めるため,さまざまな背景,非構造化参照,非ガーメントカテゴリを含む拡張データセット上でモデルをトレーニングし,VTO適応性に対するデータスケーリングのメリットを示す。 DiT-VTONはまた、衣料試着以外のVTOタスクを再定義し、幅広い製品カテゴリを処理し、ポーズ保存、ローカライズされた編集、テクスチャ転送、オブジェクトレベルのカスタマイズなどの高度な画像編集機能をサポートする、汎用の仮想トライオール(VTA)ソリューションを提供する。実験結果から,本モデルはVITON-HDの最先端手法を超越し,コンディションエンコーダの追加に依存することなく,より詳細な保存とロバスト性を実現していることがわかった。また、VTAと画像編集機能により、何千もの製品カテゴリにまたがる多様なデータセットでモデルのパフォーマンスも向上する。

論文の概要: DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing

関連論文リスト