Fugu-MT 論文翻訳(概要): PHAC: Promptable Human Amodal Completion

論文の概要: PHAC: Promptable Human Amodal Completion

arxiv url: http://arxiv.org/abs/2603.14741v1
Date: Mon, 16 Mar 2026 02:29:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:36.001472
Title: PHAC: Promptable Human Amodal Completion
Title（参考訳）: PHAC:人間のアモーダル・コンプリート
Authors: Seung Young Noh, Ju Yong Chang,
Abstract要約: 本稿では、視覚的制約と複数のユーザプロンプトを満足しつつ、隠蔽された人間のイメージを補完する新しいタスクである、アクセラブル・ヒューマン・アモーダル・コンプリート(PHAC)を導入する。 HACおよびPGPISベンチマークの実験により、我々の手法はより物理的に妥当で高品質な完成をもたらすことが示された。
参考スコア（独自算出の注目度）: 4.687712818521871
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conditional image generation methods are increasingly used in human-centric applications, yet existing human amodal completion (HAC) models offer users limited control over the completed content. Given an occluded person image, they hallucinate invisible regions while preserving visible ones, but cannot reliably incorporate user-specified constraints such as a desired pose or spatial extent. As a result, users often resort to repeatedly sampling the model until they obtain a satisfactory output. Pose-guided person image synthesis (PGPIS) methods allow explicit pose conditioning, but frequently fail to preserve the instance-specific visible appearance and tend to be biased toward the training distribution, even when built on strong diffusion model priors. To address these limitations, we introduce promptable human amodal completion (PHAC), a new task that completes occluded human images while satisfying both visible appearance constraints and multiple user prompts. Users provide simple point-based prompts, such as additional joints for the target pose or bounding boxes for desired regions; these prompts are encoded using ControlNet modules specialized for each prompt type. These modules inject the prompt signals into a pre-trained diffusion model, and we fine-tune only the cross-attention blocks to obtain strong prompt alignment without degrading the underlying generative prior. To further preserve visible content, we propose an inpainting-based refinement module that starts from a slightly noised coarse completion, faithfully preserves the visible regions, and ensures seamless blending at occlusion boundaries. Extensive experiments on the HAC and PGPIS benchmarks show that our approach yields more physically plausible and higher-quality completions, while significantly improving prompt alignment compared with existing amodal completion and pose-guided synthesis methods.
Abstract（参考訳）: 条件付き画像生成手法は、人間中心のアプリケーションでの利用が増えているが、既存のヒューマン・アモーダル・コンプリート(HAC)モデルは、ユーザーが完成コンテンツに対して限定的な制御を提供する。隠蔽された人物像が与えられた場合、視界を保ちながら目に見えない領域を幻覚させるが、所望のポーズや空間的範囲といったユーザ指定の制約を確実に取り入れることはできない。その結果、ユーザーは良好な出力を得るまで繰り返しモデルをサンプリングする。 PGPIS(Pose-Guided person Image synthesis)法では、明示的なポーズ条件付けが可能であるが、多くの場合、インスタンス固有の可視性を維持することができず、たとえ強い拡散モデルに基づく場合であっても、トレーニング分布に偏りが生じる傾向がある。これらの制約に対処するために,目に見える制約と複数のユーザプロンプトを満足しつつ,隠蔽された人間のイメージを補完する新しいタスクである,アクセラブル・ヒューマン・アモーダル・コンプリート(PHAC)を導入する。ユーザは、ターゲットのポーズのためのジョイントの追加や、所望のリージョンのバウンディングボックスなど、単純なポイントベースのプロンプトを提供する。これらのモジュールは、事前学習した拡散モデルにプロンプト信号を注入し、我々はクロスアテンションブロックのみを微調整して、基礎となる生成前を劣化させることなく強力なプロンプトアライメントを得る。可視コンテンツをさらに保存するために,わずかにノイズの多い粗い補修から始まり,可視領域を忠実に保存し,閉塞境界におけるシームレスなブレンディングを確実にする塗装ベースの精細モジュールを提案する。 HACおよびPGPISベンチマークの大規模な実験により,本手法は,既存のアモーダル補完法やポーズ誘導合成法と比較して,迅速なアライメントを向上しつつ,より物理的に可塑性で高品質なコンプリートが得られることが示された。

論文の概要: PHAC: Promptable Human Amodal Completion

関連論文リスト