Related papers: HOComp: Interaction-Aware Human-Object Composition

HOComp: Interaction-Aware Human-Object Composition

URL: http://arxiv.org/abs/2507.16813v1
Date: Tue, 22 Jul 2025 17:59:21 GMT
Title: HOComp: Interaction-Aware Human-Object Composition
Authors: Dong Liang, Jinyuan Jia, Yuhao Liu, Rynson W. H. Lau,
Abstract summary: HOComp is a novel approach for compositing a foreground object onto a human-centric background image.<n> Experimental results on our dataset show that HOComp effectively generates human-object interactions with consistent appearances.
Score: 62.93211305213214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While existing image-guided composition methods may help insert a foreground object onto a user-specified region of a background image, achieving natural blending inside the region with the rest of the image unchanged, we observe that these existing methods often struggle in synthesizing seamless interaction-aware compositions when the task involves human-object interactions. In this paper, we first propose HOComp, a novel approach for compositing a foreground object onto a human-centric background image, while ensuring harmonious interactions between the foreground object and the background person and their consistent appearances. Our approach includes two key designs: (1) MLLMs-driven Region-based Pose Guidance (MRPG), which utilizes MLLMs to identify the interaction region as well as the interaction type (e.g., holding and lefting) to provide coarse-to-fine constraints to the generated pose for the interaction while incorporating human pose landmarks to track action variations and enforcing fine-grained pose constraints; and (2) Detail-Consistent Appearance Preservation (DCAP), which unifies a shape-aware attention modulation mechanism, a multi-view appearance loss, and a background consistency loss to ensure consistent shapes/textures of the foreground and faithful reproduction of the background human. We then propose the first dataset, named Interaction-aware Human-Object Composition (IHOC), for the task. Experimental results on our dataset show that HOComp effectively generates harmonious human-object interactions with consistent appearances, and outperforms relevant methods qualitatively and quantitatively.

Related papers

Dynamic Avatar-Scene Rendering from Human-centric Context [75.95641456716373]
We propose bf Separate-then-Map (StM) strategy to bridge separately defined and optimized models.<n>StM significantly outperforms existing state-of-the-art methods in both visual quality and rendering accuracy.
arXiv Detail & Related papers (2025-11-13T17:39:06Z)
Interact-Custom: Customized Human Object Interaction Image Generation [55.8752300907835]
Existing approaches mainly concentrate on the target entity's appearance preservation, while neglecting the fine-grained interaction control among target entities.<n>We propose the task of Customized Human Object Interaction Image Generation, which simultaneously requires identity preservation for target human object and the interaction semantic control between them.<n>Two primary challenges exist for CHOI: (1)simultaneous identity preservation and interaction control demands require the model to decompose the human object into self-contained identity features and pose-oriented interaction features, while the current HOI image datasets fail to provide ideal samples for such feature-decomposed learning.
arXiv Detail & Related papers (2025-08-27T05:15:16Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction. Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z)
Modelling Spatio-Temporal Interactions for Compositional Action Recognition [21.8767024220287]
Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed. We show the effectiveness of our interaction-centric approach on the compositional Something-Else dataset. Our approach of explicit human-object-stuff interaction modeling is effective even for standard action recognition datasets.
arXiv Detail & Related papers (2023-05-04T09:37:45Z)
Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z)
Effective Actor-centric Human-object Interaction Detection [20.564689533862524]
We propose a novel actor-centric framework to detect Human-Object Interaction in images. Our method achieves the state-of-the-art on the challenging V-COCO and HICO-DET benchmarks.
arXiv Detail & Related papers (2022-02-24T10:24:44Z)
Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data. Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z)
DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection. Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features. In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.