RECAST: Interactive Auditing of Automatic Toxicity Detection Models
- URL: http://arxiv.org/abs/2001.01819v2
- Date: Wed, 1 Jul 2020 15:36:18 GMT
- Title: RECAST: Interactive Auditing of Automatic Toxicity Detection Models
- Authors: Austin P. Wright, Omar Shaikh, Haekyu Park, Will Epperson, Muhammed
Ahmed, Stephane Pinel, Diyi Yang, Duen Horng Chau
- Abstract summary: We present our ongoing work, RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech.
- Score: 39.621867230707814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As toxic language becomes nearly pervasive online, there has been increasing
interest in leveraging the advancements in natural language processing (NLP),
from very large transformer models to automatically detecting and removing
toxic comments. Despite the fairness concerns, lack of adversarial robustness,
and limited prediction explainability for deep learning systems, there is
currently little work for auditing these systems and understanding how they
work for both developers and users. We present our ongoing work, RECAST, an
interactive tool for examining toxicity detection models by visualizing
explanations for predictions and providing alternative wordings for detected
toxic speech.
Related papers
- LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Recourse for reclamation: Chatting with generative language models [2.877217169371665]
We extend the concept of algorithmic recourse to generative language models.
We provide users a novel mechanism to achieve their desired prediction by dynamically setting thresholds for toxicity filtering.
A pilot study supports the potential of our proposed recourse mechanism.
arXiv Detail & Related papers (2024-03-21T15:14:25Z) - ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in
Real-World User-AI Conversation [43.356758428820626]
We introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbots.
Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat.
In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.
arXiv Detail & Related papers (2023-10-26T13:35:41Z) - HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) [0.09208007322096533]
We present HuntGPT, a specialized intrusion detection dashboard applying a Random Forest classifier.
The paper delves into the system's architecture, components, and technical accuracy, assessed through Certified Information Security Manager (CISM) Practice Exams.
The results demonstrate that conversational agents, supported by LLM and integrated with XAI, provide robust, explainable, and actionable AI solutions in intrusion detection.
arXiv Detail & Related papers (2023-09-27T20:58:13Z) - Exploiting Multi-Object Relationships for Detecting Adversarial Attacks
in Complex Scenes [51.65308857232767]
Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples.
Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks.
We develop a novel approach to perform context consistency checks using language models.
arXiv Detail & Related papers (2021-08-19T00:52:10Z) - RECAST: Enabling User Recourse and Interpretability of Toxicity
Detection Models with Interactive Visualization [16.35961310670002]
We present our work, RECAST, an interactive, open-sourced web tool for visualizing toxic models' predictions.
We found that RECAST was highly effective at helping users reduce toxicity as detected through the model.
This opens a discussion for how toxicity detection models work and should work, and their effect on the future of online discourse.
arXiv Detail & Related papers (2021-02-08T18:37:50Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - A Controllable Model of Grounded Response Generation [122.7121624884747]
Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process.
We propose a framework that we call controllable grounded response generation (CGRG)
We show that using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.
arXiv Detail & Related papers (2020-05-01T21:22:08Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.