Cisco at SemEval-2021 Task 5: What's Toxic?: Leveraging Transformers for
Multiple Toxic Span Extraction from Online Comments
- URL: http://arxiv.org/abs/2105.13959v1
- Date: Fri, 28 May 2021 16:27:49 GMT
- Title: Cisco at SemEval-2021 Task 5: What's Toxic?: Leveraging Transformers for
Multiple Toxic Span Extraction from Online Comments
- Authors: Sreyan Ghosh, Sonal Kumar
- Abstract summary: This paper describes the system proposed by team Cisco for SemEval-2021 Task 5: Toxic Spans Detection.
We approach this problem primarily in two ways: a sequence tagging approach and a dependency parsing approach.
Our best performing architecture in this approach also proved to be our best performing architecture overall with an F1 score of 0.6922.
- Score: 1.332560004325655
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social network platforms are generally used to share positive, constructive,
and insightful content. However, in recent times, people often get exposed to
objectionable content like threat, identity attacks, hate speech, insults,
obscene texts, offensive remarks or bullying. Existing work on toxic speech
detection focuses on binary classification or on differentiating toxic speech
among a small set of categories. This paper describes the system proposed by
team Cisco for SemEval-2021 Task 5: Toxic Spans Detection, the first shared
task focusing on detecting the spans in the text that attribute to its
toxicity, in English language. We approach this problem primarily in two ways:
a sequence tagging approach and a dependency parsing approach. In our sequence
tagging approach we tag each token in a sentence under a particular tagging
scheme. Our best performing architecture in this approach also proved to be our
best performing architecture overall with an F1 score of 0.6922, thereby
placing us 7th on the final evaluation phase leaderboard. We also explore a
dependency parsing approach where we extract spans from the input sentence
under the supervision of target span boundaries and rank our spans using a
biaffine model. Finally, we also provide a detailed analysis of our results and
model performance in our paper.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Toxicity Detection for Indic Multilingual Social Media Content [0.0]
This paper describes the system proposed by team 'Moj Masti' using the data provided by ShareChat/Moj in emphIIIT-D Abusive Comment Identification challenge.
We focus on how we can leverage multilingual transformer based pre-trained and fine-tuned models to approach code-mixed/code-switched classification tasks.
arXiv Detail & Related papers (2022-01-03T12:01:47Z) - UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with
Multi-Embedding Representation for Toxicity Highlighter [3.0586855806896045]
We propose a self-attention-based gated recurrent unit with a multi-embedding representation of the tokens.
Experimental results show that our proposed approach is very effective in detecting span tokens.
arXiv Detail & Related papers (2021-04-27T13:18:28Z) - WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for
Detecting Toxic Spans [2.4737119633827174]
In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms.
Social media platforms have worked on developing automatic detection methods and employing human moderators to cope with this deluge of offensive content.
arXiv Detail & Related papers (2021-04-09T22:52:26Z) - Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech
Using BERToxic [2.4815579733050153]
This paper describes our approach to the Toxic Spans Detection problem.
We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text.
Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
arXiv Detail & Related papers (2021-04-08T04:46:14Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection [0.0]
We propose an original framework, based on the Wikipedia Comment corpus, with comment-level annotations of different types.
This large corpus of more than 380k annotated messages opens perspectives for online abuse detection and especially for context-based approaches.
We also propose, in addition to this corpus, a complete benchmarking platform to stimulate and fairly compare scientific works around the problem of content abuse detection.
arXiv Detail & Related papers (2020-03-13T10:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.