Fugu-MT 論文翻訳(概要): Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

論文の概要: Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

arxiv url: http://arxiv.org/abs/2505.21062v1
Date: Tue, 27 May 2025 11:47:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-28 17:05:58.617085
Title: Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
Title（参考訳）: Inverse Virtual Try-On: 衣服着用者から複数カテゴリの製品スタイル画像を生成する
Authors: Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe,
Abstract要約: テキスト強化Multi-category Virtual Try-Off(TEMU-VTOFF)を提案する。私たちのアーキテクチャは、画像、テキスト、マスクなどの複数のモードから衣料情報を受け取り、複数のカテゴリで機能するように設計されています。 VITON-HDおよびDress Codeデータセットの実験では、TEMU-VTOFFがVTOFFタスクに新たな最先端を設定していることが示されている。
参考スコア（独自算出の注目度）: 76.96387718150542
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While virtual try-on (VTON) systems aim to render a garment onto a target person image, this paper tackles the novel task of virtual try-off (VTOFF), which addresses the inverse problem: generating standardized product images of garments from real-world photos of clothed individuals. Unlike VTON, which must resolve diverse pose and style variations, VTOFF benefits from a consistent and well-defined output format -- typically a flat, lay-down-style representation of the garment -- making it a promising tool for data generation and dataset enhancement. However, existing VTOFF approaches face two major limitations: (i) difficulty in disentangling garment features from occlusions and complex poses, often leading to visual artifacts, and (ii) restricted applicability to single-category garments (e.g., upper-body clothes only), limiting generalization. To address these challenges, we present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF), a novel architecture featuring a dual DiT-based backbone with a modified multimodal attention mechanism for robust garment feature extraction. Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting. Finally, we propose an additional alignment module to further refine the generated visual details. Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task, significantly improving both visual quality and fidelity to the target garments.
Abstract（参考訳）: 仮想試着システム(VTON)は,対象人物画像に衣服をレンダリングすることを目的としているが,その逆問題に対処する仮想試着(VTOFF)の新たな課題に対処する。さまざまなポーズとスタイルのバリエーションを解決しなければならないVTONとは異なり、VTOFFは一貫性のある、明確に定義された出力フォーマット – 一般的にはフラットでレイダウンスタイルの服の表現 – から恩恵を受けており、データ生成とデータセット拡張のための有望なツールである。しかしながら、既存のVTOFFアプローチには2つの大きな制限がある。一衣服の特徴を隠蔽や複雑なポーズから切り離すことの難しさで、しばしば視覚的な工芸品に繋がる (二)単衣(例えば上着のみ)への適用が制限され、一般化が制限された。このような課題に対処するために,テキスト拡張MUltiカテゴリ仮想トライオフ(TEMU-VTOFF)を提案する。私たちのアーキテクチャは、画像、テキスト、マスクなどの複数のモードから衣料情報を受け取り、複数のカテゴリで機能するように設計されています。最後に、生成した視覚的詳細をさらに洗練するためのアライメントモジュールを提案する。 VITON-HDおよびDress Codeデータセットの実験では、TEMU-VTOFFがVTOFFタスクに新たな最先端を設定し、ターゲットの衣服に対する視覚的品質と忠実性の両方を著しく改善している。

論文の概要: Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

関連論文リスト