Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes
- URL: http://arxiv.org/abs/2504.16117v1
- Date: Fri, 18 Apr 2025 17:12:37 GMT
- Title: Context-Awareness and Interpretability of Rare Occurrences for Discovery and Formalization of Critical Failure Modes
- Authors: Sridevi Polavaram, Xin Zhou, Meenu Ravi, Mohammad Zarei, Anmol Srivastava,
- Abstract summary: Vision systems are increasingly deployed in critical domains such as surveillance, law enforcement, and transportation.<n>To address these challenges, we introduce Context-Awareness and Interpretability of Rare Occurrences (CAIRO)<n>CAIRO incentivizes human-in-the-loop for testing and evaluation of criticality that arises from misdetections, adversarial attacks, and hallucinations in AI black-box models.
- Score: 3.140125449151061
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vision systems are increasingly deployed in critical domains such as surveillance, law enforcement, and transportation. However, their vulnerabilities to rare or unforeseen scenarios pose significant safety risks. To address these challenges, we introduce Context-Awareness and Interpretability of Rare Occurrences (CAIRO), an ontology-based human-assistive discovery framework for failure cases (or CP - Critical Phenomena) detection and formalization. CAIRO by design incentivizes human-in-the-loop for testing and evaluation of criticality that arises from misdetections, adversarial attacks, and hallucinations in AI black-box models. Our robust analysis of object detection model(s) failures in automated driving systems (ADS) showcases scalable and interpretable ways of formalizing the observed gaps between camera perception and real-world contexts, resulting in test cases stored as explicit knowledge graphs (in OWL/XML format) amenable for sharing, downstream analysis, logical reasoning, and accountability.
Related papers
- RealHarm: A Collection of Real-World Language Model Application Failures [1.2820953788225848]
We introduce RealHarm, a dataset of annotated problematic interactions with AI agents.<n>We analyze harms, causes, and hazards specifically from the deployer's perspective.<n>We evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents.
arXiv Detail & Related papers (2025-04-14T14:44:41Z) - Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models.<n>We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness.<n>The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.
Models may behave unreliably due to poorly explored failure modes.
causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks [45.07581174558107]
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate hallucinations.
RAG systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into retrieval databases can mislead the model into generating factually incorrect outputs.
This paper investigates both the retrieval and the generation components of RAG systems to understand how to enhance their robustness against such attacks.
arXiv Detail & Related papers (2024-12-21T17:31:52Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - A Survey and Evaluation of Adversarial Attacks for Object Detection [11.48212060875543]
Deep learning models are vulnerable to adversarial examples that can deceive them into making confident but incorrect predictions.<n>This vulnerability pose significant risks in high-stakes applications such as autonomous vehicles, security surveillance, and safety-critical inspection systems.<n>This paper presents a novel taxonomic framework for categorizing adversarial attacks specific to object detection architectures.
arXiv Detail & Related papers (2024-08-04T05:22:08Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Understanding Object Detection Through An Adversarial Lens [14.976840260248913]
This paper presents a framework for analyzing and evaluating vulnerabilities of deep object detectors under an adversarial lens.
We demonstrate that the proposed framework can serve as a methodical benchmark for analyzing adversarial behaviors and risks in real-time object detection systems.
We conjecture that this framework can also serve as a tool to assess the security risks and the adversarial robustness of deep object detectors to be deployed in real-world applications.
arXiv Detail & Related papers (2020-07-11T18:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.