Fugu-MT 論文翻訳(概要): HOComp: Interaction-Aware Human-Object Composition

論文の概要: HOComp: Interaction-Aware Human-Object Composition

arxiv url: http://arxiv.org/abs/2507.16813v1
Date: Tue, 22 Jul 2025 17:59:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-23 21:34:14.248185
Title: HOComp: Interaction-Aware Human-Object Composition
Title（参考訳）: HOComp: インタラクションを意識したヒューマンオブジェクト合成
Authors: Dong Liang, Jinyuan Jia, Yuhao Liu, Rynson W. H. Lau,
Abstract要約: HOCompは、人中心の背景画像に前景オブジェクトを合成するための新しいアプローチである。実験結果から,HOCompは一貫した外見を持つ人間と物体の相互作用を効果的に生成することが示された。
参考スコア（独自算出の注目度）: 62.93211305213214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While existing image-guided composition methods may help insert a foreground object onto a user-specified region of a background image, achieving natural blending inside the region with the rest of the image unchanged, we observe that these existing methods often struggle in synthesizing seamless interaction-aware compositions when the task involves human-object interactions. In this paper, we first propose HOComp, a novel approach for compositing a foreground object onto a human-centric background image, while ensuring harmonious interactions between the foreground object and the background person and their consistent appearances. Our approach includes two key designs: (1) MLLMs-driven Region-based Pose Guidance (MRPG), which utilizes MLLMs to identify the interaction region as well as the interaction type (e.g., holding and lefting) to provide coarse-to-fine constraints to the generated pose for the interaction while incorporating human pose landmarks to track action variations and enforcing fine-grained pose constraints; and (2) Detail-Consistent Appearance Preservation (DCAP), which unifies a shape-aware attention modulation mechanism, a multi-view appearance loss, and a background consistency loss to ensure consistent shapes/textures of the foreground and faithful reproduction of the background human. We then propose the first dataset, named Interaction-aware Human-Object Composition (IHOC), for the task. Experimental results on our dataset show that HOComp effectively generates harmonious human-object interactions with consistent appearances, and outperforms relevant methods qualitatively and quantitatively.
Abstract（参考訳）: 既存の画像誘導合成法は、背景画像のユーザ指定領域に前景オブジェクトを挿入し、画像の他の部分と自然なブレンディングを実現するのに役立つが、これらの既存の手法は、タスクが人間とオブジェクトの相互作用を伴う場合に、しばしばシームレスなインタラクション・アウェア・コンポジションの合成に苦慮している。本稿では,まず,人中心の背景画像に前景オブジェクトを合成する手法であるHOCompを提案する。提案手法は, MLLMによる領域型ポスガイダンス (MRPG) と, MLLMによるインタラクション領域の識別と, アクションの変動を追跡し, きめ細かなポーズ制約を伴いながら, 生成したポーズに対して粗い制約を付与する, MLLMを用いた領域型ポスガイダンス (MRPG) と, 形状認識型アテンション調整機構, マルチビューの外観損失, 背景の整合性, 背景の再現性を確保するための背景整合性アテンション損失 (DCAP) の2つの主要な設計を含む。次に、そのタスクのための最初のデータセット、Interaction-Aware Human-Object Composition (IHOC)を提案する。実験結果から,HOCompは一貫した外観と調和した人間と物体の相互作用を効果的に生成し,関連する手法を質的かつ定量的に上回ることを示す。

論文の概要: HOComp: Interaction-Aware Human-Object Composition

関連論文リスト