Fugu-MT 論文翻訳(概要): FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models

論文の概要: FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models

arxiv url: http://arxiv.org/abs/2508.20586v1
Date: Thu, 28 Aug 2025 09:25:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:02.280028
Title: FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models
Title（参考訳）: FastFit: キャッシュ可能な拡散モデルによるマルチ参照仮想トライオンの高速化
Authors: Zheng Chong, Yanwei Lei, Shiyue Zhang, Zhuandi He, Zhen Wang, Xujie Zhang, Xiao Dong, Yiling Wu, Dongmei Jiang, Xiaodan Liang,
Abstract要約: FastFitは、キャッシュ可能な新しい拡散アーキテクチャに基づいた、高速なマルチ参照仮想試行フレームワークである。本モデルでは,パラメータのオーバーヘッドを無視して参照特徴符号化をデノナイズ処理から完全に切り離す。これにより、参照機能は一度だけ計算され、すべてのステップで損失なく再利用される。
参考スコア（独自算出の注目度）: 59.8871829077739
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite its great potential, virtual try-on technology is hindered from real-world application by two major challenges: the inability of current methods to support multi-reference outfit compositions (including garments and accessories), and their significant inefficiency caused by the redundant re-computation of reference features in each denoising step. To address these challenges, we propose FastFit, a high-speed multi-reference virtual try-on framework based on a novel cacheable diffusion architecture. By employing a Semi-Attention mechanism and substituting traditional timestep embeddings with class embeddings for reference items, our model fully decouples reference feature encoding from the denoising process with negligible parameter overhead. This allows reference features to be computed only once and losslessly reused across all steps, fundamentally breaking the efficiency bottleneck and achieving an average 3.5x speedup over comparable methods. Furthermore, to facilitate research on complex, multi-reference virtual try-on, we introduce DressCode-MR, a new large-scale dataset. It comprises 28,179 sets of high-quality, paired images covering five key categories (tops, bottoms, dresses, shoes, and bags), constructed through a pipeline of expert models and human feedback refinement. Extensive experiments on the VITON-HD, DressCode, and our DressCode-MR datasets show that FastFit surpasses state-of-the-art methods on key fidelity metrics while offering its significant advantage in inference efficiency.
Abstract（参考訳）: その大きな可能性にもかかわらず、仮想トライオン技術は現実世界の応用から妨げられている: マルチリファレンスの服(衣服やアクセサリーを含む)をサポートする現在の方法の欠如と、それぞれの装飾ステップにおける参照機能の冗長な再計算による、その大きな非効率性である。これらの課題に対処するために,新しいキャッシュ可能な拡散アーキテクチャに基づく高速なマルチ参照仮想試行フレームワークであるFastFitを提案する。本モデルでは,セミアテンション機構を導入し,参照項目のクラス埋め込みを従来のタイムステップ埋め込みに置き換えることで,参照特徴符号化をパラメータのオーバーヘッドを無視できるデノナイズプロセスから完全に切り離す。これにより、参照機能は一度だけ計算され、すべてのステップでロスレスに再利用され、基本的に効率のボトルネックを破り、同等のメソッドで平均3.5倍のスピードアップを達成することができる。さらに、複雑なマルチ参照仮想トライオンの研究を容易にするために、新しい大規模データセットであるDressCode-MRを導入する。高品質の28,179枚の画像(トップ、ボトム、ドレス、靴、バッグ)が、専門家モデルと人間のフィードバック改善のパイプラインを通して構築されている。 VITON-HD、DressCode、および私たちのDressCode-MRデータセットに関する大規模な実験は、FastFitが主要な忠実度メトリクスの最先端メソッドを超越し、推論効率において大きな優位性を提供することを示している。

論文の概要: FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models

関連論文リスト