RoHOI: Robustness Benchmark for Human-Object Interaction Detection
- URL: http://arxiv.org/abs/2507.09111v3
- Date: Mon, 13 Oct 2025 07:17:45 GMT
- Title: RoHOI: Robustness Benchmark for Human-Object Interaction Detection
- Authors: Di Wen, Kunyu Peng, Kailun Yang, Yufan Chen, Ruiping Liu, Junwei Zheng, Alina Roitberg, Danda Pani Paudel, Luc Van Gool, Rainer Stiefelhagen,
- Abstract summary: Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
- Score: 84.78366452133514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support. However, models trained on clean datasets degrade in real-world conditions due to unforeseen corruptions, leading to inaccurate predictions. To address this, we introduce the first robustness benchmark for HOI detection, evaluating model resilience under diverse challenges. Despite advances, current models struggle with environmental variability, occlusions, and noise. Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric. We systematically analyze existing models in the HOI field, revealing significant performance drops under corruptions. To improve robustness, we propose a Semantic-Aware Masking-based Progressive Learning (SAMPL) strategy to guide the model to be optimized based on holistic and partial cues, thus dynamically adjusting the model's optimization to enhance robust feature learning. Extensive experiments show that our approach outperforms state-of-the-art methods, setting a new standard for robust HOI detection. Benchmarks, datasets, and code are available at https://github.com/KratosWen/RoHOI.
Related papers
- A Validation Strategy for Deep Learning Models: Evaluating and Enhancing Robustness [0.8532585403388676]
We propose a validation approach that extracts "weak robust" samples directly from the training dataset via local analysis.<n>These samples, being the most susceptible to perturbations, serve as an early and sensitive indicator of the model's vulnerabilities.<n>We demonstrate the effectiveness of our approach on models trained with CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2025-09-23T16:14:14Z) - Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z) - On the Robustness of Human-Object Interaction Detection against Distribution Shift [27.40641711088878]
Human-Object Interaction (HOI) detection has seen substantial advances in recent years.<n>Existing works focus on the standard setting with ideal images and natural distribution, far from practical scenarios with inevitable distribution shifts.<n>In this work, we investigate this issue by benchmarking, analyzing, and enhancing the robustness of HOI detection models under various distribution shifts.
arXiv Detail & Related papers (2025-06-22T13:01:34Z) - Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution [66.11004226578771]
Existing robust benchmark datasets have two key limitations.<n>They generate only a limited range of perturbations for a single Information Extraction (IE) task.<n>Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench.<n>We show that training with only textbf15% of the data leads to an average textbf7.5% relative performance improvement across three IE tasks.
arXiv Detail & Related papers (2025-03-05T05:39:29Z) - Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments [20.307151769610087]
Continual Test-Time Adaptation (CTTA) has emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains.<n>We present AMROD, featuring three core components, to tackle these challenges for detection models in CTTA scenarios.<n>We demonstrate the effectiveness of AMROD on four CTTA object detection tasks, where AMROD outperforms existing methods.
arXiv Detail & Related papers (2024-06-24T08:30:03Z) - Open World Object Detection in the Era of Foundation Models [53.683963161370585]
We introduce a new benchmark that includes five real-world application-driven datasets.
We introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
arXiv Detail & Related papers (2023-12-10T03:56:06Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Incremental Outlier Detection Modelling Using Streaming Analytics in Finance & Health Care [0.0]
In the era of real-time data, traditional methods often struggle to keep pace with the dynamic nature of streaming environments.<n>In this paper, we proposed a hybrid framework where the model is built once and evaluated in a real-time environment.<n>We employed 8 distinct state-of-the-art outlier detection models, including one-class support vector machine (OCSVM), isolation forest adaptive sliding window approach (IForest ASD), exact storm (ES), angle-based outlier detection (ABOD), local outlier factor (LOF), Kitsunes online algorithm (KitNet), and K-nearest neighbour
arXiv Detail & Related papers (2023-05-17T02:30:28Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - Benchmarking the Robustness of LiDAR Semantic Segmentation Models [78.6597530416523]
In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions.
We propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy.
We design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications.
arXiv Detail & Related papers (2023-01-03T06:47:31Z) - CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal
Relationships [8.679073301435265]
We construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data.
We use these labels to perturb the data by deleting non-causal agents from the scene.
Under non-causal perturbations, we observe a $25$-$38%$ relative change in minADE as compared to the original.
arXiv Detail & Related papers (2022-07-07T21:28:23Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z) - Benchmarking Robustness of Machine Reading Comprehension Models [29.659586787812106]
We construct AdvRACE, a new model-agnostic benchmark for evaluating the robustness of MRC models under four different types of adversarial attacks.
We show that state-of-the-art (SOTA) models are vulnerable to all of these attacks.
We conclude that there is substantial room for building more robust MRC models and our benchmark can help motivate and measure progress in this area.
arXiv Detail & Related papers (2020-04-29T08:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.