Toxicity Inspector: A Framework to Evaluate Ground Truth in Toxicity
Detection Through Feedback
- URL: http://arxiv.org/abs/2305.10433v1
- Date: Thu, 11 May 2023 11:56:42 GMT
- Title: Toxicity Inspector: A Framework to Evaluate Ground Truth in Toxicity
Detection Through Feedback
- Authors: Huriyyah Althunayan, Rahaf Bahlas, Manar Alharbi, Lena Alsuwailem,
Abeer Aldayel, Rehab ALahmadi
- Abstract summary: This paper introduces a toxicity inspector framework that incorporates a human-in-the-loop pipeline.
It aims to enhance the reliability of toxicity benchmark datasets by centering the evaluator's values through an iterative feedback cycle.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Toxic language is difficult to define, as it is not monolithic and has many
variations in perceptions of toxicity. This challenge of detecting toxic
language is increased by the highly contextual and subjectivity of its
interpretation, which can degrade the reliability of datasets and negatively
affect detection model performance. To fill this void, this paper introduces a
toxicity inspector framework that incorporates a human-in-the-loop pipeline
with the aim of enhancing the reliability of toxicity benchmark datasets by
centering the evaluator's values through an iterative feedback cycle. The
centerpiece of this framework is the iterative feedback process, which is
guided by two metric types (hard and soft) that provide evaluators and dataset
creators with insightful examination to balance the tradeoff between
performance gains and toxicity avoidance.
Related papers
- A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement [7.345136916791223]
We introduce a novel content moderation framework that emphasizes the importance of capturing annotation disagreement.
We leverage uncertainty estimation techniques, specifically Conformal Prediction, to account for both the ambiguity in comment annotations and the model's inherent uncertainty in predicting toxicity and disagreement.
arXiv Detail & Related papers (2024-11-06T18:08:57Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models [53.84677081899392]
KIEval is a Knowledge-grounded Interactive Evaluation framework for large language models.
It incorporates an LLM-powered "interactor" role for the first time to accomplish a dynamic contamination-resilient evaluation.
Extensive experiments on seven leading LLMs across five datasets validate KIEval's effectiveness and generalization.
arXiv Detail & Related papers (2024-02-23T01:30:39Z) - Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric [16.423707276483178]
We introduce a robust metric grounded on Large Language Models (LLMs) to flexibly measure toxicity according to the given definition.
Our results demonstrate outstanding performance in measuring toxicity within verified factors, improving on conventional metrics by 12 points in the F1 score.
arXiv Detail & Related papers (2024-02-10T07:55:27Z) - On the definition of toxicity in NLP [2.1830650692803863]
This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware.
On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
arXiv Detail & Related papers (2023-10-03T18:32:34Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - ToxCCIn: Toxic Content Classification with Interpretability [16.153683223016973]
Explanations are important for tasks like offensive language or toxicity detection on social media.
We propose a technique to improve the interpretability of transformer models, based on a simple and powerful assumption.
We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis.
arXiv Detail & Related papers (2021-03-01T22:17:10Z) - Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations.
We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.