UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named
Entity Recognition and Question-Answering Approaches
- URL: http://arxiv.org/abs/2104.07376v1
- Date: Thu, 15 Apr 2021 11:07:56 GMT
- Title: UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named
Entity Recognition and Question-Answering Approaches
- Authors: Phu Gia Hoang, Luan Thanh Nguyen, Kiet Van Nguyen
- Abstract summary: This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing experiments.
We solve this task by two approaches, named entity recognition with spaCy library and Question-Answering with RoBERTa combining with ToxicBERT, and the former gains the highest F1-score of 66.99%.
- Score: 0.32228025627337864
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The increment of toxic comments on online space is causing tremendous effects
on other vulnerable users. For this reason, considerable efforts are made to
deal with this, and SemEval-2021 Task 5: Toxic Spans Detection is one of those.
This task asks competitors to extract spans that have toxicity from the given
texts, and we have done several analyses to understand its structure before
doing experiments. We solve this task by two approaches, Named Entity
Recognition with spaCy library and Question-Answering with RoBERTa combining
with ToxicBERT, and the former gains the highest F1-score of 66.99%.
Related papers
- Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - Collaborative Anomaly Detection [66.51075412012581]
We propose collaborative anomaly detection (CAD) to jointly learn all tasks with an embedding encoding correlations among tasks.
We explore CAD with conditional density estimation and conditional likelihood ratio estimation.
It is beneficial to select a small number of tasks in advance to learn a task embedding model, and then use it to warm-start all task embeddings.
arXiv Detail & Related papers (2022-09-20T18:01:07Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Cisco at SemEval-2021 Task 5: What's Toxic?: Leveraging Transformers for
Multiple Toxic Span Extraction from Online Comments [1.332560004325655]
This paper describes the system proposed by team Cisco for SemEval-2021 Task 5: Toxic Spans Detection.
We approach this problem primarily in two ways: a sequence tagging approach and a dependency parsing approach.
Our best performing architecture in this approach also proved to be our best performing architecture overall with an F1 score of 0.6922.
arXiv Detail & Related papers (2021-05-28T16:27:49Z) - UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with
Multi-Embedding Representation for Toxicity Highlighter [3.0586855806896045]
We propose a self-attention-based gated recurrent unit with a multi-embedding representation of the tokens.
Experimental results show that our proposed approach is very effective in detecting span tokens.
arXiv Detail & Related papers (2021-04-27T13:18:28Z) - UIT-ISE-NLP at SemEval-2021 Task 5: Toxic Spans Detection with
BiLSTM-CRF and Toxic Bert Comment Classification [0.0]
This task aims to build a model for identifying toxic words in a whole posts.
We use the BiLSTM-CRF model combining with Toxic Bert Classification to train the detection model.
Our model achieved 62.23% by F1-score on the Toxic Spans Detection task.
arXiv Detail & Related papers (2021-04-20T16:32:56Z) - Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech
Using BERToxic [2.4815579733050153]
This paper describes our approach to the Toxic Spans Detection problem.
We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text.
Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
arXiv Detail & Related papers (2021-04-08T04:46:14Z) - HLE-UPC at SemEval-2021 Task 5: Multi-Depth DistilBERT for Toxic Spans
Detection [0.0]
The purpose of this task is to detect the spans that make a text toxic.
Due to toxicity not always coming from single words like insults or offends, but sometimes from whole expressions formed by words that may not be toxic individually.
We study the impact of using a multi-depth DistilBERT model, which uses embeddings from different layers to estimate the final per-token toxicity.
arXiv Detail & Related papers (2021-04-01T17:37:38Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - Filtering before Iteratively Referring for Knowledge-Grounded Response
Selection in Retrieval-Based Chatbots [56.52403181244952]
This paper proposes a method named Filtering before Iteratively REferring (FIRE) for this task.
We show that FIRE outperforms previous methods by margins larger than 2.8% and 4.1% on the PERSONA-CHAT dataset.
We also show that FIRE is more interpretable by visualizing the knowledge grounding process.
arXiv Detail & Related papers (2020-04-30T02:27:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.