Discovering and Validating AI Errors With Crowdsourced Failure Reports
- URL: http://arxiv.org/abs/2109.11690v1
- Date: Thu, 23 Sep 2021 23:26:59 GMT
- Title: Discovering and Validating AI Errors With Crowdsourced Failure Reports
- Authors: \'Angel Alexander Cabrera, Abraham J. Druck, Jason I. Hong, Adam Perer
- Abstract summary: We introduce crowdsourced failure reports, end-user descriptions of how or why a model failed, and show how developers can use them to detect AI errors.
We also design and implement Deblinder, a visual analytics system for synthesizing failure reports.
In semi-structured interviews and think-aloud studies with 10 AI practitioners, we explore the affordances of the Deblinder system and the applicability of failure reports in real-world settings.
- Score: 10.4818618376202
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI systems can fail to learn important behaviors, leading to real-world
issues like safety concerns and biases. Discovering these systematic failures
often requires significant developer attention, from hypothesizing potential
edge cases to collecting evidence and validating patterns. To scale and
streamline this process, we introduce crowdsourced failure reports, end-user
descriptions of how or why a model failed, and show how developers can use them
to detect AI errors. We also design and implement Deblinder, a visual analytics
system for synthesizing failure reports that developers can use to discover and
validate systematic failures. In semi-structured interviews and think-aloud
studies with 10 AI practitioners, we explore the affordances of the Deblinder
system and the applicability of failure reports in real-world settings. Lastly,
we show how collecting additional data from the groups identified by developers
can improve model performance.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Develop End-to-End Anomaly Detection System [3.130722489512822]
Anomaly detection plays a crucial role in ensuring network robustness.
We propose an end-to-end anomaly detection model development pipeline.
We demonstrate the efficacy of the framework by way of introducing and bench-marking a new forecasting model.
arXiv Detail & Related papers (2024-02-01T09:02:44Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Generalizable Error Modeling for Human Data Annotation: Evidence From an Industry-Scale Search Data Annotation Program [0.0]
This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks.
We show that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications.
We demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors.
arXiv Detail & Related papers (2023-10-08T21:21:19Z) - Representing Timed Automata and Timing Anomalies of Cyber-Physical
Production Systems in Knowledge Graphs [51.98400002538092]
This paper aims to improve model-based anomaly detection in CPPS by combining the learned timed automaton with a formal knowledge graph about the system.
Both the model and the detected anomalies are described in the knowledge graph in order to allow operators an easier interpretation of the model and the detected anomalies.
arXiv Detail & Related papers (2023-08-25T15:25:57Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - IRJIT: A Simple, Online, Information Retrieval Approach for Just-In-Time Software Defect Prediction [10.084626547964389]
Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time.
Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models.
We propose an approach called IRJIT that employs information retrieval on source code and labels new commits as buggy or clean based on their similarity to past buggy or clean commits.
arXiv Detail & Related papers (2022-10-05T17:54:53Z) - Capturing Failures of Large Language Models via Human Cognitive Biases [18.397404180932373]
We show that OpenAI's Codex errs based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples.
Our experiments suggest that cognitive science can be a useful jumping-off point to better understand how contemporary machine learning systems behave.
arXiv Detail & Related papers (2022-02-24T18:58:52Z) - Causal Scene BERT: Improving object detection by searching for
challenging groups of data [125.40669814080047]
Computer vision applications rely on learning-based perception modules parameterized with neural networks for tasks like object detection.
These modules frequently have low expected error overall but high error on atypical groups of data due to biases inherent in the training process.
Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.
arXiv Detail & Related papers (2022-02-08T05:14:16Z) - Accountable Error Characterization [7.830479195591646]
We propose an accountable error characterization method, AEC, to understand when and where errors occur.
We perform error detection for a sentiment analysis task using AEC as a case study.
arXiv Detail & Related papers (2021-05-10T23:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.