Fugu-MT 論文翻訳(概要): FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

論文の概要: FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

arxiv url: http://arxiv.org/abs/2605.15824v1
Date: Fri, 15 May 2026 10:25:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 21:22:26.253059
Title: FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization
Title（参考訳）: FashionChameleon: リアルタイムでインタラクティブなヒューマンガーメントビデオのカスタマイズを目指す
Authors: Quanjian Song, Yefeng Shen, Mengting Chen, Hao Sun, Jinsong Lan, Xiaoyong Zhu, Bo Zheng, Liujuan Cao,
Abstract要約: FashionChameleonは、自動回帰ビデオ生成におけるヒューマンガーメントのカスタマイズのためのリアルタイムかつインタラクティブなフレームワークである。シングルガーメント映像データのみを用いて、動きコヒーレンスを保ちながら、インタラクティブなマルチガーメント映像のカスタマイズを実現する方法について述べる。
参考スコア（独自算出の注目度）: 35.648761912138795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-centric video customization, particularly at the garment level, has shown significant commercial value. However, existing approaches cannot support low-latency and interactive garment control, which is crucial for applications such as e-commerce and content creation. This paper studies how to achieve interactive multi-garment video customization while preserving motion coherence using only single-garment video data. We present FashionChameleon, a real-time and interactive framework for human-garment customization in autoregressive video generation, where users can interactively switch garment during generation. FashionChameleon consists of three key techniques: (i) Instead of training on multi-garment video data, we train a Teacher Model with In-Context Learning on a single reference-garment pair. By retaining the image-to-video training paradigm while enforcing a mismatch between the reference and garment image, the model is encouraged to implicitly preserve coherence during single-garment switching. (ii) To achieve consistency and efficiency during generation, we introduce Streaming Distillation with In-Context Learning, which fine-tunes the model with in-context teacher forcing and improves extrapolation consistency via gradient-reweighted distribution matching distillation. (iii) To extend the model for interactive multi-garment video customization, we propose Training-Free KV Cache Rescheduling, which includes garment KV refresh, historical KV withdraw, and reference KV disentangle to achieve garment switching while preserving motion coherence. Our FashionChameleon uniquely supports interactive customization and consistent long-video extrapolation, while achieving real-time generation at 23.8 FPS on a single GPU, 30-180$\times$ faster than existing baselines.
Abstract（参考訳）: 人間の中心的なビデオのカスタマイズ、特に衣服のレベルでは、商業的価値が顕著である。しかし、既存のアプローチは、電子商取引やコンテンツ作成といったアプリケーションにとって重要な、低レイテンシでインタラクティブな衣服制御をサポートできない。本稿では,シングルガーメント映像データのみを用いて,動きコヒーレンスを保ちながら,インタラクティブなマルチガーメント映像のカスタマイズを実現する方法について検討する。 FashionChameleonは、自動回帰ビデオ生成における人着カスタマイズのためのリアルタイムかつインタラクティブなフレームワークであり、ユーザーは生成時に対話的に衣料を切り替えることができる。 FashionChameleonには3つの重要なテクニックがある。 i)マルチガーメント映像データをトレーニングする代わりに,教師モデルとインコンテクスト学習を1対のリファレンスガーメントペアでトレーニングする。基準画像と衣料画像のミスマッチを強制しながら、映像間トレーニングパラダイムを保持することにより、単着切替時のコヒーレンスを暗黙的に保持することが奨励される。 (II) 生成時の一貫性と効率を達成するために, 教師の強制力でモデルを微調整し, 勾配重み付き分布マッチング蒸留による補間整合性を向上させる, インコンテキスト学習によるストリーム蒸留を導入する。 3) インタラクティブなマルチガーメント映像のカスタマイズモデルを拡張するため, 動作コヒーレンスを維持しながら衣料KVリフレッシュ, 歴史的なKVリフレッシュ, 参照KVアンタングルを含むトレーニングフリーなKVキャッシュ再スケジューリングを提案する。 FashionChameleonは、インタラクティブなカスタマイズと一貫した長ビデオ外挿をサポートすると同時に、1つのGPUで23.8 FPSのリアルタイム生成を実現し、既存のベースラインよりも30-180$\times$高速です。

論文の概要: FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

関連論文リスト