Fugu-MT 論文翻訳(概要): PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask

論文の概要: PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask

arxiv url: http://arxiv.org/abs/2412.16978v1
Date: Sun, 22 Dec 2024 11:38:04 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-24 19:42:48.206402
Title: PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
Title（参考訳）: PromptDresser: 生成テキストプロンプトとPrompt対応マスクによる仮想トライオンの品質と制御性の向上
Authors: Jeongho Kim, Hoiyeong Jin, Sunghyun Park, Jaegul Choo,
Abstract要約: 本稿では,提供される衣服画像に基づいて衣料品を変更するテキスト編集可能な仮想試着タスクに取り組む。テキスト編集可能な仮想試行において、(i)ペア化された人着データのためのリッチテキスト記述を設計してモデルを訓練すること、(ii)既存の人の衣服のテクスト情報が新しい衣服の発生を妨害する紛争に対処すること、(iii)テキスト記述に沿った塗装マスクを適応的に調整すること、の3つの重要な側面が存在する。本手法では,個人と衣服の画像の詳細な記述を個別に生成するために,コンテキスト内学習によるLMMを利用する。
参考スコア（独自算出の注目度）: 35.052909478338115
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent virtual try-on approaches have advanced by fine-tuning the pre-trained text-to-image diffusion models to leverage their powerful generative ability. However, the use of text prompts in virtual try-on is still underexplored. This paper tackles a text-editable virtual try-on task that changes the clothing item based on the provided clothing image while editing the wearing style (e.g., tucking style, fit) according to the text descriptions. In the text-editable virtual try-on, three key aspects exist: (i) designing rich text descriptions for paired person-clothing data to train the model, (ii) addressing the conflicts where textual information of the existing person's clothing interferes the generation of the new clothing, and (iii) adaptively adjust the inpainting mask aligned with the text descriptions, ensuring proper editing areas while preserving the original person's appearance irrelevant to the new clothing. To address these aspects, we propose PromptDresser, a text-editable virtual try-on model that leverages large multimodal model (LMM) assistance to enable high-quality and versatile manipulation based on generative text prompts. Our approach utilizes LMMs via in-context learning to generate detailed text descriptions for person and clothing images independently, including pose details and editing attributes using minimal human cost. Moreover, to ensure the editing areas, we adjust the inpainting mask depending on the text prompts adaptively. We found that our approach, utilizing detailed text prompts, not only enhances text editability but also effectively conveys clothing details that are difficult to capture through images alone, thereby enhancing image quality. Our code is available at https://github.com/rlawjdghek/PromptDresser.
Abstract（参考訳）: 最近の仮想試行法は、その強力な生成能力を活用するために、事前訓練されたテキスト-画像拡散モデルを微調整することで進歩している。しかし、仮想トライオンにおけるテキストプロンプトの使用は、まだ未調査である。本稿では,テキスト記述に従って着用スタイル(例えば,タッキングスタイル,フィット)を編集しながら,提供される衣服画像に基づいて衣料品を変更する,テキスト編集可能な仮想試着タスクに取り組む。テキスト編集可能な仮想試行では、3つの重要な側面が存在する。一モデルを訓練するために、一対の人着データのためのリッチテキスト記述を設計すること。二既存の者の衣服のテクスト情報が新着の発生を妨害する紛争に対処すること。三テクストの記述に沿う塗布マスクを適応的に調整し、原人の外観を新着によらず保ちつつ、適切な編集領域を確保すること。これらの課題に対処するため,テキスト編集可能な仮想試行モデルであるPromptDresserを提案する。提案手法では,LMMを用いて個人・衣服画像の詳細なテキスト記述を個別に生成し,人件費を最小限に抑えることで,ポーズの詳細や属性の編集を行う。さらに、編集領域を確保するため、テキストのプロンプトに応じて塗装マスクを適応的に調整する。提案手法は, 詳細なテキストプロンプトを利用して, テキスト編集性を高めるだけでなく, 画像のみをキャプチャし難い衣服の細部を効果的に伝達し, 画質を向上させる。私たちのコードはhttps://github.com/rlawjdghek/PromptDresser.comで利用可能です。

論文の概要: PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask

関連論文リスト