HLE-UPC at SemEval-2021 Task 5: Multi-Depth DistilBERT for Toxic Spans
Detection
- URL: http://arxiv.org/abs/2104.00639v1
- Date: Thu, 1 Apr 2021 17:37:38 GMT
- Title: HLE-UPC at SemEval-2021 Task 5: Multi-Depth DistilBERT for Toxic Spans
Detection
- Authors: Rafel Palliser, Albert Rial
- Abstract summary: The purpose of this task is to detect the spans that make a text toxic.
Due to toxicity not always coming from single words like insults or offends, but sometimes from whole expressions formed by words that may not be toxic individually.
We study the impact of using a multi-depth DistilBERT model, which uses embeddings from different layers to estimate the final per-token toxicity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents our submission to SemEval-2021 Task 5: Toxic Spans
Detection. The purpose of this task is to detect the spans that make a text
toxic, which is a complex labour for several reasons. Firstly, because of the
intrinsic subjectivity of toxicity, and secondly, due to toxicity not always
coming from single words like insults or offends, but sometimes from whole
expressions formed by words that may not be toxic individually. Following this
idea of focusing on both single words and multi-word expressions, we study the
impact of using a multi-depth DistilBERT model, which uses embeddings from
different layers to estimate the final per-token toxicity. Our quantitative
results show that using information from multiple depths boosts the performance
of the model. Finally, we also analyze our best model qualitatively.
Related papers
- PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models [27.996123856250065]
Existing toxicity benchmarks are overwhelmingly focused on English.
We introduce PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages.
arXiv Detail & Related papers (2024-05-15T14:22:33Z) - Unveiling the Implicit Toxicity in Large Language Models [77.90933074675543]
The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use.
We show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting.
We propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs.
arXiv Detail & Related papers (2023-11-29T06:42:36Z) - Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - Facilitating Fine-grained Detection of Chinese Toxic Language:
Hierarchical Taxonomy, Resources, and Benchmarks [18.44630180661091]
Existing datasets lack fine-grained annotation of toxic types and expressions.
It is crucial to introduce lexical knowledge to detect the toxicity of posts.
In this paper, we facilitate the fine-grained detection of Chinese toxic language.
arXiv Detail & Related papers (2023-05-08T03:50:38Z) - Text Detoxification using Large Pre-trained Neural Models [57.72086777177844]
We present two novel unsupervised methods for eliminating toxicity in text.
First method combines guidance of the generation process with small style-conditional language models.
Second method uses BERT to replace toxic words with their non-offensive synonyms.
arXiv Detail & Related papers (2021-09-18T11:55:32Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with
Multi-Embedding Representation for Toxicity Highlighter [3.0586855806896045]
We propose a self-attention-based gated recurrent unit with a multi-embedding representation of the tokens.
Experimental results show that our proposed approach is very effective in detecting span tokens.
arXiv Detail & Related papers (2021-04-27T13:18:28Z) - UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named
Entity Recognition and Question-Answering Approaches [0.32228025627337864]
This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing experiments.
We solve this task by two approaches, named entity recognition with spaCy library and Question-Answering with RoBERTa combining with ToxicBERT, and the former gains the highest F1-score of 66.99%.
arXiv Detail & Related papers (2021-04-15T11:07:56Z) - Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech
Using BERToxic [2.4815579733050153]
This paper describes our approach to the Toxic Spans Detection problem.
We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text.
Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
arXiv Detail & Related papers (2021-04-08T04:46:14Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.