$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception
Models under Perturbations
- URL: http://arxiv.org/abs/2403.04924v1
- Date: Thu, 7 Mar 2024 22:18:12 GMT
- Title: $\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception
Models under Perturbations
- Authors: Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak,
Hao Chen, Xiaonan Huang, Bhiksha Raj
- Abstract summary: We present a comprehensive taxonomy of perturbations, and then develop a versatile toolbox for synthesizing and evaluating the effects of composite disturbances.
We propose the $textR2$-Agent, an LLM-based agent that simplifies and automates model evaluation via natural language instructions.
- Score: 36.74309198908876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Referring perception, which aims at grounding visual objects with multimodal
referring guidance, is essential for bridging the gap between humans, who
provide instructions, and the environment where intelligent systems perceive.
Despite progress in this field, the robustness of referring perception models
(RPMs) against disruptive perturbations is not well explored. This work
thoroughly assesses the resilience of RPMs against various perturbations in
both general and specific contexts. Recognizing the complex nature of referring
perception tasks, we present a comprehensive taxonomy of perturbations, and
then develop a versatile toolbox for synthesizing and evaluating the effects of
composite disturbances. Employing this toolbox, we construct
$\text{R}^2$-Bench, a benchmark for assessing the Robustness of Referring
perception models under noisy conditions across five key tasks. Moreover, we
propose the $\text{R}^2$-Agent, an LLM-based agent that simplifies and
automates model evaluation via natural language instructions. Our investigation
uncovers the vulnerabilities of current RPMs to various perturbations and
provides tools for assessing model robustness, potentially promoting the safe
and resilient integration of intelligent systems into complex real-world
scenarios.
Related papers
- Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks [50.75902473813379]
This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models.
The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes.
arXiv Detail & Related papers (2024-07-04T14:36:49Z) - From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking [32.52171076424419]
Embodied agents require robust navigation systems to operate in unstructured environments.
We propose a novel, customizable pipeline for noisy data synthesis.
Our analysis uncovers the susceptibilities of both neural (NeRF) and non-neural SLAM models to disturbances.
arXiv Detail & Related papers (2024-06-24T17:57:05Z) - Towards Evaluating the Robustness of Visual State Space Models [63.14954591606638]
Vision State Space Models (VSSMs) have demonstrated remarkable performance in visual perception tasks.
However, their robustness under natural and adversarial perturbations remains a critical concern.
We present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios.
arXiv Detail & Related papers (2024-06-13T17:59:44Z) - Learning Latent Dynamic Robust Representations for World Models [9.806852421730165]
Visual Model-Based Reinforcement Learning (MBL) promises to agent's knowledge about the underlying dynamics of the environment.
Top-temporal agents such as Dreamer often struggle with visual pixel-based inputs in the presence of irrelevant noise in the observation space.
We apply a-temporal masking strategy, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models.
arXiv Detail & Related papers (2024-05-10T06:28:42Z) - Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty [40.55653383218379]
This work focuses on learning in distributionally robust Markov games (RMGs)
We propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria.
arXiv Detail & Related papers (2024-04-29T17:51:47Z) - Speech Robust Bench: A Robustness Benchmark For Speech Recognition [20.758654420612793]
Speech Robust Bench (SRB) is a benchmark for evaluating the robustness of ASR models to diverse corruptions.
SRB is composed of 114 input perturbations which simulate an heterogeneous range of corruptions that ASR models may encounter when deployed in the wild.
arXiv Detail & Related papers (2024-03-08T08:10:29Z) - Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive
Learning [71.8876256714229]
We propose an entity-based contrastive learning framework for improving the robustness of knowledge-grounded dialogue systems.
Our method achieves new state-of-the-art performance in terms of automatic evaluation scores.
arXiv Detail & Related papers (2024-01-09T05:16:52Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.