Related papers: Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning

Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning

URL: http://arxiv.org/abs/2510.27020v1
Date: Thu, 30 Oct 2025 21:59:34 GMT
Title: Incremental Human-Object Interaction Detection with Invariant Relation Representation Learning
Authors: Yana Wei, Zeen Chi, Chongyu Wang, Yu Wu, Shipeng Yan, Yongfei Liu, Xuming He,
Abstract summary: In open-world environments, human-object interactions (HOIs) evolve continuously, challenging conventional closed-world HOI detection models.<n>Inspired by humans' ability to progressively acquire knowledge, we explore incremental HOI detection (IHOID) to develop agents capable of discerning human-object relations in such dynamic environments.<n>We propose a novel exemplar-free incremental relation distillation (IRD) framework. IRD decouples the learning of objects and relations, and introduces two unique distillation losses for learning invariant relation features across different HOI combinations that share the same relation.
Score: 24.134958315913888
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In open-world environments, human-object interactions (HOIs) evolve continuously, challenging conventional closed-world HOI detection models. Inspired by humans' ability to progressively acquire knowledge, we explore incremental HOI detection (IHOID) to develop agents capable of discerning human-object relations in such dynamic environments. This setup confronts not only the common issue of catastrophic forgetting in incremental learning but also distinct challenges posed by interaction drift and detecting zero-shot HOI combinations with sequentially arriving data. Therefore, we propose a novel exemplar-free incremental relation distillation (IRD) framework. IRD decouples the learning of objects and relations, and introduces two unique distillation losses for learning invariant relation features across different HOI combinations that share the same relation. Extensive experiments on HICO-DET and V-COCO datasets demonstrate the superiority of our method over state-of-the-art baselines in mitigating forgetting, strengthening robustness against interaction drift, and generalization on zero-shot HOIs. Code is available at \href{https://github.com/weiyana/ContinualHOI}{this HTTP URL}

Related papers

Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition [71.5328300638085]
Zero-shot Human-object interaction (HOI) detection aims to locate humans and objects in images and recognize their interactions.<n>Existing methods, including two-stage methods, tightly couple interaction recognition with a specific detector.<n>We propose a decoupled framework that separates object detection from IR and leverages multi-modal large language models (MLLMs) for zero-shot IR.
arXiv Detail & Related papers (2026-02-16T19:01:31Z)
Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations [63.80827184637476]
We introduce D-STAR, a hierarchical policy that disentangles when to act from where to act.<n>We validate our framework through extensive and rigorous simulations.
arXiv Detail & Related papers (2026-01-14T14:37:06Z)
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z)
Learning Human-Object Interaction as Groups [52.28258599873394]
GroupHOI is a framework that propagates contextual information in terms of geometric proximity and semantic similarity.<n>It exhibits leading performance on the more challenging Nonverbal Interaction Detection task.
arXiv Detail & Related papers (2025-10-21T07:25:10Z)
Funnel-HOI: Top-Down Perception for Zero-Shot HOI Detection [3.656114607436271]
Human-object interaction detection (HOID) refers to localizing interactive human-object pairs in images and identifying the interactions.<n>We build a top-down framework named Funnel-HOI inspired by the human tendency to grasp well-defined concepts first and then associate them with abstract concepts during scene understanding.<n>A novel asymmetric co-attention mechanism mines these cues utilizing multimodal information (incorporating zero-shot capabilities) and yields stronger interaction representations at the encoder level.
arXiv Detail & Related papers (2025-07-16T20:47:24Z)
Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback [59.768119380109084]
This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback.<n>We propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs)<n>Our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.
arXiv Detail & Related papers (2025-05-15T03:22:03Z)
Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues. Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z)
Neural-Logic Human-Object Interaction Detection [67.4993347702353]
We present L OGIC HOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible interactions between entities. Specifically, we modify the self-attention mechanism in vanilla Transformer, enabling it to reason over the human, action, object> triplet and constitute novel interactions. We formulate these two properties in first-order logic and ground them into continuous space to constrain the learning process of our approach, leading to improved performance and zero-shot generalization capabilities.
arXiv Detail & Related papers (2023-11-16T11:47:53Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
EiHi Net: Out-of-Distribution Generalization Paradigm [6.6157730528755065]
EiHi net is a model learning paradigm that can be blessed on any visual backbone.<n>This paper develops a new EiHi net to solve the out-of-distribution (OoD) generalization problem in deep learning.
arXiv Detail & Related papers (2022-09-29T17:08:12Z)
RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection [40.65483058890176]
Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions. We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference. Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net)
arXiv Detail & Related papers (2021-04-30T14:03:10Z)
RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream [27.600873320989276]
High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios. Recent advances still fail to provide reliable performance reconstruction. We propose RobustFusion, a robust volumetric performance reconstruction system for human-object interaction scenarios.
arXiv Detail & Related papers (2021-04-30T08:41:45Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.