Fugu-MT 論文翻訳(概要): Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model

論文の概要: Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model

arxiv url: http://arxiv.org/abs/2312.10960v1
Date: Mon, 18 Dec 2023 06:30:39 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-20 21:02:49.707393
Title: Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model
Title（参考訳）: basic-to-advanced hierarchy diffusion model による詳細なテキスト・モーション合成に向けて
Authors: Zhenyu Xie and Yang Wu and Xuehao Gao and Zhongqian Sun and Wei Yang and Xiaodan Liang
Abstract要約: 本稿では,B2A-HDMと呼ばれる新しい階層型拡散モデルを提案する。特に、低次元ラテント空間における基本拡散モデルは、テキスト記述と整合した中間偏微分結果を与える。高次元ラテント空間における高度な拡散モデルは、以下の詳細エンハンス・デノナイジング過程に焦点をあてる。
参考スコア（独自算出の注目度）: 60.27825196999742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-guided motion synthesis aims to generate 3D human motion that not only precisely reflects the textual description but reveals the motion details as much as possible. Pioneering methods explore the diffusion model for text-to-motion synthesis and obtain significant superiority. However, these methods conduct diffusion processes either on the raw data distribution or the low-dimensional latent space, which typically suffer from the problem of modality inconsistency or detail-scarce. To tackle this problem, we propose a novel Basic-to-Advanced Hierarchical Diffusion Model, named B2A-HDM, to collaboratively exploit low-dimensional and high-dimensional diffusion models for high quality detailed motion synthesis. Specifically, the basic diffusion model in low-dimensional latent space provides the intermediate denoising result that to be consistent with the textual description, while the advanced diffusion model in high-dimensional latent space focuses on the following detail-enhancing denoising process. Besides, we introduce a multi-denoiser framework for the advanced diffusion model to ease the learning of high-dimensional model and fully explore the generative potential of the diffusion model. Quantitative and qualitative experiment results on two text-to-motion benchmarks (HumanML3D and KIT-ML) demonstrate that B2A-HDM can outperform existing state-of-the-art methods in terms of fidelity, modality consistency, and diversity.
Abstract（参考訳）: テキスト誘導型モーション合成は、テキスト記述を正確に反映するだけでなく、動きの詳細を可能な限り明らかにする3Dモーションを生成することを目的としている。ピオネリング法はテキスト間合成の拡散モデルを探索し、大きな優位性を得る。しかしながら、これらの手法は、原データ分布または低次元潜在空間上で拡散過程を行い、通常、モダリティの不整合やディテールスカースの問題に悩まされる。そこで,本稿では,低次元・高次元拡散モデルを用いて高精度な動き合成を実現するために,新しい階層拡散モデルであるb2a-hdmを提案する。具体的には、低次元ラテント空間における基本拡散モデルにより、テキスト記述と整合する中間偏微分結果が得られ、高次元ラテント空間における高度な拡散モデルは、以下の詳細化のプロセスに焦点をあてる。さらに,高次元モデルの学習を容易にし,拡散モデルの生成可能性を完全に探求する,高度な拡散モデルのためのマルチデノワフレームワークを提案する。 2つのテキスト間ベンチマーク(HumanML3DとKIT-ML)の定量的および定性的な実験結果から、B2A-HDMは、忠実性、モダリティの整合性、多様性の点で既存の最先端手法より優れていることが示された。

関連論文リスト

Emergence and Evolution of Interpretable Concepts in Diffusion Models [24.5360032541275]
我々はスパースオートエンコーダ(SAE)を用いて、人気のあるテキスト・画像拡散モデルの内部動作を探索する。第1逆拡散段階が完了する前にも、シーンの最終的な構成は驚くほどよく予測できることがわかった。得られた概念がモデル出力に因果的影響を及ぼし、生成過程を制御できることを示す。
論文参考訳（メタデータ） (2025-04-21T22:48:37Z)
Language-Informed Hyperspectral Image Synthesis for Imbalanced-Small Sample Classification via Semi-Supervised Conditional Diffusion Model [1.9746060146273674]
本稿では,新しい言語インフォームドハイパースペクトル画像合成法であるTxt2HSI-LDM(VAE)を提案する。ハイパースペクトルデータの高次元性に対処するため、普遍変分オートエンコーダ(VAE)は、データを低次元の潜在空間にマッピングするように設計されている。 VAEは、拡散モデルによって生成された潜時空間から言語条件を入力としてHSIをデコードする。
論文参考訳（メタデータ） (2025-02-27T02:35:49Z)
Accelerating Video Diffusion Models via Distribution Matching [26.475459912686986]
本研究は, 拡散蒸留と分散マッチングのための新しい枠組みを導入する。提案手法は, 事前学習した拡散モデルをより効率的な数ステップ生成器に蒸留することに焦点を当てる。ビデオGAN損失と新しい2Dスコア分布マッチング損失の組合せを利用して、高品質なビデオフレームを生成する可能性を実証する。
論文参考訳（メタデータ） (2024-12-08T11:36:32Z)
Diffusion Models in Low-Level Vision: A Survey [82.77962165415153]
拡散モデルに基づくソリューションは、優れた品質と多様性のサンプルを作成する能力で広く称賛されている。本稿では,3つの一般化拡散モデリングフレームワークを提案し,それらと他の深層生成モデルとの相関関係について検討する。医療、リモートセンシング、ビデオシナリオなど、他のタスクに適用された拡張拡散モデルについて要約する。
論文参考訳（メタデータ） (2024-06-17T01:49:27Z)
4Diffusion: Multi-view Video Diffusion Model for 4D Generation [55.82208863521353]
現在の4D生成法は, 高度な拡散生成モデルの助けを借りて, 有意義な有効性を実現している。モノクロビデオから空間的・時間的に一貫した4Dコンテンツを生成することを目的とした,新しい4D生成パイプライン,すなわち4Diffusionを提案する。
論文参考訳（メタデータ） (2024-05-31T08:18:39Z)
Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling [2.1779479916071067]
より広い範囲のプロセスをサポートすることで拡散モデルを強化する新しいフレームワークを提案する。また,前処理を学習するための新しいパラメータ化手法を提案する。結果はNFDMの汎用性と幅広い応用の可能性を評価する。
論文参考訳（メタデータ） (2024-04-19T15:10:54Z)
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization [59.63880337156392]
拡散モデルはコンピュータビジョン、オーディオ、強化学習、計算生物学において大きな成功を収めた。経験的成功にもかかわらず、拡散モデルの理論は非常に限定的である。本稿では,前向きな理論や拡散モデルの手法を刺激する理論的露光について述べる。
論文参考訳（メタデータ） (2024-04-11T14:07:25Z)
PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation [47.15358646320958]
PrimDiffusionは3Dヒューマンジェネレーションのための初めての拡散ベースのフレームワークである。我々のフレームワークは、高品質な3D人間のリアルタイムレンダリングを、512Times512$の解像度でサポートします。
論文参考訳（メタデータ） (2023-12-07T18:59:33Z)
Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models [58.357180353368896]
本稿では,現実的で多様な3D骨格に基づく運動生成問題に対処するために,拡散確率モデル(DDPM)の利点を生かした条件付きパラダイムを提案する。我々はDDPMを用いてカテゴリ的動作で条件付けられた動作列の可変数を合成する先駆的な試みである。
論文参考訳（メタデータ） (2023-01-10T13:15:42Z)
Diffusion Models in Vision: A Survey [80.82832715884597]
拡散モデルは、前方拡散段階と逆拡散段階の2つの段階に基づく深層生成モデルである。拡散モデルは、既知の計算負荷にもかかわらず、生成したサンプルの品質と多様性に対して広く評価されている。
論文参考訳（メタデータ） (2022-09-10T22:00:30Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。