UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with
Multi-Embedding Representation for Toxicity Highlighter
- URL: http://arxiv.org/abs/2104.13164v1
- Date: Tue, 27 Apr 2021 13:18:28 GMT
- Title: UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with
Multi-Embedding Representation for Toxicity Highlighter
- Authors: Hamed Babaei Giglou, Taher Rahgooy, Mostafa Rahgouy and Jafar Razmara
- Abstract summary: We propose a self-attention-based gated recurrent unit with a multi-embedding representation of the tokens.
Experimental results show that our proposed approach is very effective in detecting span tokens.
- Score: 3.0586855806896045
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Toxic Spans Detection(TSD) task is defined as highlighting spans that make a
text toxic. Many works have been done to classify a given comment or document
as toxic or non-toxic. However, none of those proposed models work at the token
level. In this paper, we propose a self-attention-based bidirectional gated
recurrent unit(BiGRU) with a multi-embedding representation of the tokens. Our
proposed model enriches the representation by a combination of GPT-2, GloVe,
and RoBERTa embeddings, which led to promising results. Experimental results
show that our proposed approach is very effective in detecting span tokens.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - Cheap Ways of Extracting Clinical Markers from Texts [0.0]
This paper describes the work of the UniBuc Archaeology team for CLPsych's 2024 Shared Task.
It involved finding evidence within the text supporting the assigned suicide risk level.
Two types of evidence were required: highlights and summaries.
arXiv Detail & Related papers (2024-03-17T14:21:42Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - Revisiting Multimodal Representation in Contrastive Learning: From Patch
and Token Embeddings to Finite Discrete Tokens [76.40196364163663]
We propose a learning-based vision-language pre-training approach, such as CLIP.
We show that our method can learn more comprehensive representations and capture meaningful cross-modal correspondence.
arXiv Detail & Related papers (2023-03-27T00:58:39Z) - Neighborhood Contrastive Learning for Novel Class Discovery [79.14767688903028]
We build a new framework, named Neighborhood Contrastive Learning, to learn discriminative representations that are important to clustering performance.
We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-06-20T17:34:55Z) - Cisco at SemEval-2021 Task 5: What's Toxic?: Leveraging Transformers for
Multiple Toxic Span Extraction from Online Comments [1.332560004325655]
This paper describes the system proposed by team Cisco for SemEval-2021 Task 5: Toxic Spans Detection.
We approach this problem primarily in two ways: a sequence tagging approach and a dependency parsing approach.
Our best performing architecture in this approach also proved to be our best performing architecture overall with an F1 score of 0.6922.
arXiv Detail & Related papers (2021-05-28T16:27:49Z) - WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for
Detecting Toxic Spans [2.4737119633827174]
In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms.
Social media platforms have worked on developing automatic detection methods and employing human moderators to cope with this deluge of offensive content.
arXiv Detail & Related papers (2021-04-09T22:52:26Z) - Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech
Using BERToxic [2.4815579733050153]
This paper describes our approach to the Toxic Spans Detection problem.
We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text.
Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
arXiv Detail & Related papers (2021-04-08T04:46:14Z) - HLE-UPC at SemEval-2021 Task 5: Multi-Depth DistilBERT for Toxic Spans
Detection [0.0]
The purpose of this task is to detect the spans that make a text toxic.
Due to toxicity not always coming from single words like insults or offends, but sometimes from whole expressions formed by words that may not be toxic individually.
We study the impact of using a multi-depth DistilBERT model, which uses embeddings from different layers to estimate the final per-token toxicity.
arXiv Detail & Related papers (2021-04-01T17:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.