UIT-ISE-NLP at SemEval-2021 Task 5: Toxic Spans Detection with
BiLSTM-CRF and Toxic Bert Comment Classification
- URL: http://arxiv.org/abs/2104.10100v1
- Date: Tue, 20 Apr 2021 16:32:56 GMT
- Title: UIT-ISE-NLP at SemEval-2021 Task 5: Toxic Spans Detection with
BiLSTM-CRF and Toxic Bert Comment Classification
- Authors: Son T. Luu, Ngan Luu-Thuy Nguyen
- Abstract summary: This task aims to build a model for identifying toxic words in a whole posts.
We use the BiLSTM-CRF model combining with Toxic Bert Classification to train the detection model.
Our model achieved 62.23% by F1-score on the Toxic Spans Detection task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present our works on SemEval-2021 Task 5 about Toxic Spans Detection. This
task aims to build a model for identifying toxic words in a whole posts. We use
the BiLSTM-CRF model combining with Toxic Bert Classification to train the
detection model for identifying toxic words in the posts. Our model achieved
62.23% by F1-score on the Toxic Spans Detection task.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Potion: Towards Poison Unlearning [47.00450933765504]
Adversarial attacks by malicious actors on machine learning systems pose significant risks.
The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified.
Our work addresses two key challenges to advance the state of the art in poison unlearning.
arXiv Detail & Related papers (2024-06-13T14:35:11Z) - Detoxifying Large Language Models via Knowledge Editing [57.0669577257301]
This paper investigates using knowledge editing techniques to detoxify Large Language Models (LLMs)
We construct a benchmark, SafeEdit, which covers nine unsafe categories with various powerful attack prompts.
We conduct experiments with several knowledge editing approaches, indicating that knowledge editing has the potential to detoxify LLMs with a limited impact on general performance efficiently.
arXiv Detail & Related papers (2024-03-21T15:18:30Z) - Unveiling the Implicit Toxicity in Large Language Models [77.90933074675543]
The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use.
We show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting.
We propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs.
arXiv Detail & Related papers (2023-11-29T06:42:36Z) - ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments [4.949881799107062]
ToxiSpanSE is the first tool to detect toxic spans in the Software Engineering (SE) domain.
Our model achieved the best score with 0.88 $F1$, 0.87 precision, and 0.93 recall for toxic class tokens.
arXiv Detail & Related papers (2023-07-07T04:55:11Z) - UPB at SemEval-2021 Task 5: Virtual Adversarial Training for Toxic Spans
Detection [0.7197592390105455]
Semeval-2021, Task 5 - Toxic Spans Detection is based on a novel annotation of a subset of the Jigsaw Unintended Bias dataset.
For this task, participants had to automatically detect character spans in short comments that render the message as toxic.
Our model considers applying Virtual Adversarial Training in a semi-supervised setting during the fine-tuning process of several Transformer-based models.
arXiv Detail & Related papers (2021-04-17T19:42:12Z) - UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named
Entity Recognition and Question-Answering Approaches [0.32228025627337864]
This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing experiments.
We solve this task by two approaches, named entity recognition with spaCy library and Question-Answering with RoBERTa combining with ToxicBERT, and the former gains the highest F1-score of 66.99%.
arXiv Detail & Related papers (2021-04-15T11:07:56Z) - Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech
Using BERToxic [2.4815579733050153]
This paper describes our approach to the Toxic Spans Detection problem.
We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text.
Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
arXiv Detail & Related papers (2021-04-08T04:46:14Z) - HLE-UPC at SemEval-2021 Task 5: Multi-Depth DistilBERT for Toxic Spans
Detection [0.0]
The purpose of this task is to detect the spans that make a text toxic.
Due to toxicity not always coming from single words like insults or offends, but sometimes from whole expressions formed by words that may not be toxic individually.
We study the impact of using a multi-depth DistilBERT model, which uses embeddings from different layers to estimate the final per-token toxicity.
arXiv Detail & Related papers (2021-04-01T17:37:38Z) - Poison Attacks against Text Datasets with Conditional Adversarially
Regularized Autoencoder [78.01180944665089]
This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems.
We present a 'backdoor poisoning' attack on NLP models.
arXiv Detail & Related papers (2020-10-06T13:03:49Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.