Fugu-MT 論文翻訳(概要): Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

論文の概要: Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

arxiv url: http://arxiv.org/abs/2604.04934v1
Date: Mon, 06 Apr 2026 17:59:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:19.341189
Title: Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
Title（参考訳）: Vanast: 合成トリプルト・スーパービジョンによる人間のイメージアニメーションによる仮想トライオン
Authors: Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo,
Abstract要約: Vanastは、単一の人間画像、衣料品画像、ポーズガイダンスビデオから直接、衣料品に変換された人間のアニメーションビデオを生成するフレームワークである。本モデルでは,コヒーレント合成を実現するため,プロセス全体を統一的なステップで実行することで問題に対処する。
参考スコア（独自算出の注目度）: 23.145506516223126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Vanast, a unified framework that generates garment-transferred human animation videos directly from a single human image, garment images, and a pose guidance video. Conventional two-stage pipelines treat image-based virtual try-on and pose-driven animation as separate processes, which often results in identity drift, garment distortion, and front-back inconsistency. Our model addresses these issues by performing the entire process in a single unified step to achieve coherent synthesis. To enable this setting, we construct large-scale triplet supervision. Our data generation pipeline includes generating identity-preserving human images in alternative outfits that differ from garment catalog images, capturing full upper and lower garment triplets to overcome the single-garment-posed video pair limitation, and assembling diverse in-the-wild triplets without requiring garment catalog images. We further introduce a Dual Module architecture for video diffusion transformers to stabilize training, preserve pretrained generative quality, and improve garment accuracy, pose adherence, and identity preservation while supporting zero-shot garment interpolation. Together, these contributions allow Vanast to produce high-fidelity, identity-consistent animation across a wide range of garment types.
Abstract（参考訳）: 本稿では,1枚の人間画像,衣料品画像,ポーズガイダンスビデオから直接,衣料品を転送した人間のアニメーション映像を生成する統一フレームワークであるVanastを紹介する。従来の2段階のパイプラインは、イメージベースの仮想試行とポーズ駆動のアニメーションを別々のプロセスとして扱う。このモデルでは、コヒーレントな合成を実現するために、プロセス全体を単一の統一的なステップで実行することで、これらの問題に対処する。この設定を可能にするために,大規模な三重項監視を構築した。我々のデータ生成パイプラインは、衣料品のカタログ画像とは異なる別の衣装でアイデンティティ保存された人間の画像を生成し、衣料品のカタログ画像を必要とせず、単着の映像対の制限を克服するために、上着と下着のトリップレットをフルにキャプチャし、様々な組立三脚を組立てることを含む。さらに,ビデオ拡散トランスフォーマーのためのデュアルモジュールアーキテクチャを導入し,トレーニングの安定化,事前訓練された生成品質の維持,衣服の精度の向上,アテンデンス,アイデンティティの保全を実現し,ゼロショットの衣服補間をサポートした。これらの貢献により、ヴァナストは多種多様な衣服にまたがる高忠実でアイデンティティと一貫性のあるアニメーションを制作できる。

論文の概要: Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

関連論文リスト