Fugu-MT 論文翻訳(概要): OmniTryOn: Video Try-On Anything at Once!

論文の概要: OmniTryOn: Video Try-On Anything at Once!

arxiv url: http://arxiv.org/abs/2606.08514v1
Date: Sun, 07 Jun 2026 08:40:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.180494
Title: OmniTryOn: Video Try-On Anything at Once!
Title（参考訳）: OmniTryOn: Video Try-On Anything at Once!
Authors: Changliang Xia, Chengyou Jia, Minnan Luo, Zhuohang Dang, Xin Shen, Bowen Ping,
Abstract要約: 本稿では,多様なウェアラブルオブジェクトを1回の推論パスで動画に同時転送することを目的とした,新しいTry-On Anythingタスクを提案する。 OmniTryOnは、このタスクに対処するために設計された外部プライオリティな生成フレームワークである。 TryAny-Benchの実験では、OmniTryOnが既存のバーチャルバーチャルトライオンモデルを大幅に上回っていることが示されている。
参考スコア（独自算出の注目度）: 32.91851342240063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although video virtual try-on (VVT) has achieved significant progress, existing methods still exhibit two fundamental limitations: first, they are restricted to single-garment transfer, rendering simultaneous multi-object try-on highly impractical; second, their heavy reliance on explicit external priors (e.g., garment masks) inevitably destroys crucial physical dynamics and degrades visual quality. To bridge this gap, this paper proposes the novel Try-On Anything task, which aims to simultaneously transfer diverse wearable objects onto a person in a video in a single inference pass. To support and standardize this paradigm, we introduce TryAny-Bench, a comprehensive benchmark encompassing a paired video dataset alongside a tailored evaluation protocol. Furthermore, we present OmniTryOn, an external-prior-free generative framework designed to tackle this task. Specifically, OmniTryOn employs a First Frame Wearable Cache strategy, which directly provides diverse wearable objects for the generation process through the initial video frame. To maintain consistency, we propose the Spatiotemporally Consistent RoPE (STC-RoPE), which inherently establishes robust spatiotemporal anchors to strictly preserve complex human motions and background dynamics. Optimized by the proposed Gradual Try-On (GTO) training strategy, our model progressively masters robust multi-object synthesis. Extensive experiments on TryAny-Bench demonstrate that OmniTryOn significantly outperforms existing specialized video virtual try-on models and general video editing baselines, establishing a powerful new standard for the Try-On Anything task. Our dataset, code, and models are available at https://github.com/xcltql666/OminTryOn.
Abstract（参考訳）: ビデオ仮想トライオン(VVT)は大きな進歩を遂げているが、既存の手法では、ひとつはシングルガーメント転送に制限されていること、もうひとつは、複数のオブジェクトのトライオンを同時にレンダリングすること、もうひとつは、露骨な外装マスク(例えば、仮装マスク)に大きく依存していることは、必然的に重要な物理的ダイナミクスを破壊し、視覚的品質を低下させることである。このギャップを埋めるために,本研究では,多様なウェアラブルオブジェクトを1回の推論パスで動画に同時転送することを目的とした,新しいTry-On Anythingタスクを提案する。このパラダイムをサポートし,標準化するために,TryAny-Benchを紹介した。さらに,この課題に対処するために設計された外部プライアレス生成フレームワークであるOmniTryOnを紹介する。具体的には、OmniTryOnはFirst Frame Wearable Cache戦略を採用している。一貫性を維持するために,複雑な人間の動きや背景のダイナミクスを厳格に保存するために,頑健な時空間アンカーを本質的に確立する時空間整合RoPE(STC-RoPE)を提案する。提案したGradual Try-On(GTO)トレーニング戦略によって最適化されたモデルでは,ロバストな多目的合成を段階的にマスターする。 TryAny-Benchでの大規模な実験では、OmniTryOnは既存のバーチャルバーチャルトライオンモデルと一般的なビデオ編集ベースラインを大きく上回っており、トライオンタスクの強力な新しい標準を確立している。私たちのデータセット、コード、モデルはhttps://github.com/xcltql666/OminTryOnで利用可能です。

論文の概要: OmniTryOn: Video Try-On Anything at Once!

関連論文リスト