Fugu-MT 論文翻訳(概要): GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting

論文の概要: GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting

arxiv url: http://arxiv.org/abs/2410.05259v1
Date: Mon, 07 Oct 2024 17:58:20 GMT
ステータス: エラー
システム内更新日: 2024-10-08 12:53:47.352781
Title: GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting
Title（参考訳）:
Authors: Yukang Cao, Masoud Hadi, Liang Pan, Ziwei Liu,
Abstract要約:
参考スコア（独自算出の注目度）: 0.0
License:
Abstract: Diffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clothing. Second, 2D VTON results generated from different viewpoints of the same 3D scene lack coherence and spatial relationships, hence frequently leading to appearance inconsistencies and geometric distortions. To resolve these problems, we introduce an image-prompted 3D VTON method (dubbed GS-VTON) which, by leveraging 3D Gaussian Splatting (3DGS) as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. (1) Specifically, we propose a personalized diffusion model that utilizes low-rank adaptation (LoRA) fine-tuning to incorporate personalized information into pre-trained 2D VTON models. To achieve effective LoRA training, we introduce a reference-driven image editing approach that enables the simultaneous editing of multi-view images while ensuring consistency. (2) Furthermore, we propose a persona-aware 3DGS editing framework to facilitate effective editing while maintaining consistent cross-view appearance and high-quality 3D geometry. (3) Additionally, we have established a new 3D VTON benchmark, 3D-VTONBench, which facilitates comprehensive qualitative and quantitative 3D VTON evaluations. Through extensive experiments and comparative analyses with existing methods, the proposed \OM has demonstrated superior fidelity and advanced editing capabilities, affirming its effectiveness for 3D VTON.
Abstract（参考訳）:

関連論文リスト

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation [51.43837087865105]
大規模な画像データセットに基づいてトレーニングされた視覚基礎モデル(VFM)は、非常に高度な2D視覚認識を備えた高品質な機能を提供する。 3D画像と3Dポイントクラウドデータセットの共通利用にもかかわらず、彼らの3Dビジョンのポテンシャルは依然としてほとんど未解決のままである。 2Dファンデーションモデルの特徴を抽出し,それを3Dに投影し,最終的に3Dポイントクラウドセグメンテーションモデルに注入する,シンプルで効果的なアプローチであるDITRを導入する。
論文参考訳（メタデータ） (2025-03-24T17:59:11Z)
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction [103.0918705283309]
VTON(Virtual Try-On)は、電子商取引とファッションデザインにおける革新的技術であり、個人における衣服のリアルなデジタル視覚化を可能にする。 VTON 360は、任意のビューレンダリングをサポートする高忠実度VTONを実現するためのオープンな課題に対処する新しい3次元VTON法である。
論文参考訳（メタデータ） (2025-03-15T15:08:48Z)
Unifying 2D and 3D Vision-Language Understanding [85.84054120018625]
2次元および3次元視覚言語学習のための統一アーキテクチャUniVLGを紹介する。 UniVLGは、既存の2D中心モデルと、エンボディシステムで利用可能なリッチな3Dセンサーデータのギャップを埋める。
論文参考訳（メタデータ） (2025-03-13T17:56:22Z)
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding [10.81711535075112]
3Dビジュアルグラウンド(3D Visual Grounding)は、拡張現実(AR)やロボティクス(ロボティクス)などの応用に欠かせない、テキストによる記述に基づく3Dシーンのオブジェクトを見つけることを目的としている。大規模2次元データに基づいて訓練された2次元視覚言語モデル(VLM)を活用したゼロショット3DVGフレームワークであるSeeeGroundを紹介する。 SeeGroundは3Dのシーンを3Dデータと2D-VLMの入力フォーマットのギャップを埋め、クエリ整列された画像と空間的にリッチなテキスト記述のハイブリッドとして表現している。
論文参考訳（メタデータ） (2024-12-05T17:58:43Z)
DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models [56.55549019625362]
画像ベースの3Dバーチャルトライオン(VTON)は、人や衣服の画像に基づいて3D人間を彫刻することを目的としている。近年のテキスト・ツー・3D法は高忠実度3Dヒューマンジェネレーションにおいて顕著な改善を実現している。我々は,3次元人間の形状とテクスチャを個別に最適化するために,textbfDreamVTONという新しい3次元人体試行モデルを提案する。
論文参考訳（メタデータ） (2024-07-23T14:25:28Z)
GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting [2.2975420753582028]
電子商取引はバーチャルトライオン(VTON)の重要性を強調している 3D VTONの研究は、主に衣服体形状の整合性に焦点を当てている。 3Dシーン編集の進歩により、多視点編集による3D編集に2D拡散モデルが適用された。
論文参考訳（メタデータ） (2024-05-13T05:18:07Z)
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion [33.69006364120861]
安定ビデオ3D(SV3D) - 3Dオブジェクトの周囲の高解像度・画像・マルチビュー生成のための潜時ビデオ拡散モデルを提案する。
論文参考訳（メタデータ） (2024-03-18T17:46:06Z)
Weakly Supervised Monocular 3D Detection with a Single-View Image [58.57978772009438]
モノクロ3D検出は、単一視点画像からの正確な3Dオブジェクトのローカライゼーションを目的としている。 SKD-WM3Dは弱い教師付き単分子3D検出フレームワークである。我々は,SKD-WM3Dが最先端技術を超え,多くの完全教師付き手法と同等であることを示した。
論文参考訳（メタデータ） (2024-02-29T13:26:47Z)
3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
大規模3次元データリポジトリから抽出した3次元特徴を有効活用し,RGB画像から抽出した2次元特徴を向上する手法を提案する。まず,事前学習した3Dネットワークから3D知識を抽出して2Dネットワークを監督し,トレーニング中の2D特徴からシミュレーションされた3D特徴を学習する。次に,2次元の正規化方式を設計し,2次元特徴と3次元特徴のキャリブレーションを行った。第3に,非ペアの3dデータを用いたトレーニングのフレームワークを拡張するために,意味を意識した対向的トレーニングモデルを設計した。
論文参考訳（メタデータ） (2021-04-06T02:22:24Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。