Fugu-MT 論文翻訳(概要): Hierarchical Fashion Design with Multi-stage Diffusion Models

論文の概要: Hierarchical Fashion Design with Multi-stage Diffusion Models

arxiv url: http://arxiv.org/abs/2401.07450v3
Date: Sat, 20 Jan 2024 05:21:13 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-23 19:18:25.769412
Title: Hierarchical Fashion Design with Multi-stage Diffusion Models
Title（参考訳）: 多段拡散モデルを用いた階層型ファッションデザイン
Authors: Zhifeng Xie, Hao Li, Huiming Ding, Mengtian Li, Ying Cao
Abstract要約: クロスモーダルなファッション合成と編集は、ファッションデザイナーにインテリジェントなサポートを提供する。現在の拡散モデルは、画像合成における可換安定性と制御性を示している。共有多段階拡散モデルを用いた新しいファッションデザイン手法であるHieraFashDiffを提案する。
参考スコア（独自算出の注目度）: 17.848891542772446
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-modal fashion synthesis and editing offer intelligent support to fashion designers by enabling the automatic generation and local modification of design drafts.While current diffusion models demonstrate commendable stability and controllability in image synthesis,they still face significant challenges in generating fashion design from abstract design elements and fine-grained editing.Abstract sensory expressions, \eg office, business, and party, form the high-level design concepts, while measurable aspects like sleeve length, collar type, and pant length are considered the low-level attributes of clothing.Controlling and editing fashion images using lengthy text descriptions poses a difficulty.In this paper, we propose HieraFashDiff,a novel fashion design method using the shared multi-stage diffusion model encompassing high-level design concepts and low-level clothing attributes in a hierarchical structure.Specifically, we categorized the input text into different levels and fed them in different time step to the diffusion model according to the criteria of professional clothing designers.HieraFashDiff allows designers to add low-level attributes after high-level prompts for interactive editing incrementally.In addition, we design a differentiable loss function in the sampling process with a mask to keep non-edit areas.Comprehensive experiments performed on our newly conducted Hierarchical fashion dataset,demonstrate that our proposed method outperforms other state-of-the-art competitors.
Abstract（参考訳）: Cross-modal fashion synthesis and editing offer intelligent support to fashion designers by enabling the automatic generation and local modification of design drafts.While current diffusion models demonstrate commendable stability and controllability in image synthesis,they still face significant challenges in generating fashion design from abstract design elements and fine-grained editing.Abstract sensory expressions, \eg office, business, and party, form the high-level design concepts, while measurable aspects like sleeve length, collar type, and pant length are considered the low-level attributes of clothing.Controlling and editing fashion images using lengthy text descriptions poses a difficulty.In this paper, we propose HieraFashDiff,a novel fashion design method using the shared multi-stage diffusion model encompassing high-level design concepts and low-level clothing attributes in a hierarchical structure.Specifically, we categorized the input text into different levels and fed them in different time step to the diffusion model according to the criteria of professional clothing designers.HieraFashDiff allows designers to add low-level attributes after high-level prompts for interactive editing incrementally.In addition, we design a differentiable loss function in the sampling process with a mask to keep non-edit areas.Comprehensive experiments performed on our newly conducted Hierarchical fashion dataset,demonstrate that our proposed method outperforms other state-of-the-art competitors.

関連論文リスト

Learning to Synthesize Compatible Fashion Items Using Semantic Alignment and Collocation Classification: An Outfit Generation Framework [59.09707044733695]
衣料品全体を合成することを目的とした,新しい衣料品生成フレームワークであるOutfitGANを提案する。 OutfitGANにはセマンティックアライメントモジュールがあり、既存のファッションアイテムと合成アイテムのマッピング対応を特徴付ける。提案モデルの性能を評価するため,20,000のファッション衣装からなる大規模データセットを構築した。
論文参考訳（メタデータ） (2025-02-05T12:13:53Z)
EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
本稿では,条件付き画像生成タスクのための単一の統合自己回帰フレームワークであるEditARを提案する。このモデルは、画像と命令の両方を入力として取り、バニラの次のパラダイムで編集された画像トークンを予測する。確立されたベンチマークにおいて,様々なタスクにまたがる実効性を評価し,様々なタスク固有の手法に対する競争性能を示す。
論文参考訳（メタデータ） (2025-01-08T18:59:35Z)
AIpparel: A Multimodal Foundation Model for Digital Garments [71.12933771326279]
縫製パターンの生成と編集のためのマルチモーダル基礎モデルであるAIpparelを紹介する。当社のモデルでは,12万以上のユニークな衣服をカスタマイズした大規模データセット上で,最先端の大規模マルチモーダルモデルを微調整する。本稿では,これらの複雑な縫製パターンを簡潔に符号化し,LLMが効率的に予測できる新しいトークン化手法を提案する。
論文参考訳（メタデータ） (2024-12-05T07:35:19Z)
DiCTI: Diffusion-based Clothing Designer via Text-guided Input [5.275658744475251]
DiCTI (Diffusion-based Clothing Designer via Text-guided Input)は、デザイナーがテキスト入力のみを使用してファッション関連のアイデアを素早く視覚化できるようにする。テキスト入力に条件付けされた強力な拡散ベースの塗装モデルを活用することで、DICTIは、さまざまな衣料デザインで、説得力のある高品質な画像を合成することができる。
論文参考訳（メタデータ） (2024-07-04T12:48:36Z)
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis [65.78359025027457]
MetaDesignerは、Large Language Models(LLM)の強みを活用して、ユーザエンゲージメントを中心としたデザインパラダイムを推進することによって、芸術的なタイポグラフィに革命をもたらす。総合的なフィードバックメカニズムは、マルチモーダルモデルとユーザ評価からの洞察を活用して、設計プロセスを反復的に洗練し、拡張する。実証的な検証は、MetaDesignerが様々なWordArtアプリケーションに効果的に機能し、審美的に魅力的でコンテキストに敏感な結果を生み出す能力を強調している。
論文参考訳（メタデータ） (2024-06-28T11:58:26Z)
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
本研究では,グラフィックレイアウトの自動生成のための統合フレームワークを提案する。データ駆動方式では、レイアウトを生成するために構造化テキスト(JSONフォーマット)とビジュアルインストラクションチューニングを用いる。我々は、大規模な実験を行い、パブリックなマルチモーダルレイアウト生成ベンチマーク上で、最先端(SOTA)性能を達成した。
論文参考訳（メタデータ） (2024-06-05T03:05:52Z)
FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
本研究では,遅延拡散モデルを用いて,ファッションデザインのプロセスを変えるための新しい生成パイプラインを提案する。我々は、スケッチデータを統合することで、マルチモーダルドレスコードやVITON-HDを含む最先端の仮想試行データセットを活用し、強化する。
論文参考訳（メタデータ） (2024-04-26T14:59:42Z)
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models [81.6240188672294]
グラフィックデザインでは、プロでないユーザは、限られたスキルとリソースのために視覚的に魅力的なレイアウトを作成するのに苦労することが多い。レイアウト計画のための新しいマルチモーダル・インストラクション・フォロー・フレームワークを導入し、視覚的要素をカスタマイズしたレイアウトに簡単に配置できるようにする。本手法は,非専門職の設計プロセスを単純化するだけでなく,数ショット GPT-4V モデルの性能を上回り,mIoU は Crello で 12% 向上する。
論文参考訳（メタデータ） (2024-04-23T17:58:33Z)
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
本稿では,マルチモーダルなファッション画像編集の課題に取り組む。本研究の目的は,テキスト,人体ポーズ,衣料品スケッチ,布地テクスチャなど,マルチモーダルなプロンプトでガイドされた人中心のファッションイメージを作成することである。
論文参考訳（メタデータ） (2024-03-21T20:43:10Z)
HAIFIT: Human-to-AI Fashion Image Translation [6.034505799418777]
本稿では,スケッチを高忠実なライフスタイルの衣料品画像に変換する新しいアプローチであるHAIFITを紹介する。本手法は, ファッションデザインに欠かせない, 独特のスタイルの保存に優れ, 細部が複雑である。
論文参考訳（メタデータ） (2024-03-13T16:06:07Z)
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
広い範囲のレイアウト生成タスクを処理する統一モデルを提案する。このモデルは連続拡散モデルに基づいている。実験結果から,LACEは高品質なレイアウトを生成することがわかった。
論文参考訳（メタデータ） (2024-02-07T11:12:41Z)
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models [84.12784265734238]
Arbitrary Style Transfer (AST)の目標は、あるスタイル参照の芸術的特徴を所定の画像/ビデオに注入することである。各種のセマンティックな手がかりに基づいてスタイリング結果を明示的にカスタマイズできるHiCASTを提案する。新たな学習目標をビデオ拡散モデルトレーニングに活用し,フレーム間の時間的一貫性を大幅に向上させる。
論文参考訳（メタデータ） (2024-01-11T12:26:23Z)
FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training [12.652002299515864]
ファッションシンボルと属性プロンプト(FashionSAP)に基づく、きめ細かいファッションビジョン言語事前学習法を提案する。まず,新しい抽象的なファッション概念層であるファッションシンボルを,異なるファッションアイテムを表現するために提案する。次に、モデルにファッションアイテムの特定の属性を明示的に学習させる属性プロンプト手法を提案する。
論文参考訳（メタデータ） (2023-04-11T08:20:17Z)
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
本稿では,人間中心のファッション画像の生成を導くマルチモーダルなファッション画像編集の課題を提案する。我々は遅延拡散モデルに基づく新しいアーキテクチャを提案することでこの問題に対処する。タスクに適した既存のデータセットがないので、既存の2つのファッションデータセットも拡張します。
論文参考訳（メタデータ） (2023-04-04T18:03:04Z)
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning [66.38951790650887]
ファッション分野におけるマルチモーダルタスクは、eコマースにとって大きな可能性を秘めている。本稿では,ファッションとテクストのペアから構築した弱教師付き三つ組に基づく,ファッション特有の事前学習フレームワークを提案する。 3重項に基づくタスクは、標準的なマルチモーダル事前学習タスクに有効な追加であることを示す。
論文参考訳（メタデータ） (2022-10-26T21:01:19Z)
Modeling Artistic Workflows for Image Generation and Editing [83.43047077223947]
与えられた芸術的ワークフローに従う生成モデルを提案する。既存の芸術作品の多段画像編集だけでなく、多段画像生成も可能である。
論文参考訳（メタデータ） (2020-07-14T17:54:26Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。