bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for
Detecting Homophobia and Transphobia in Social Media Comments
- URL: http://arxiv.org/abs/2203.14267v1
- Date: Sun, 27 Mar 2022 10:15:34 GMT
- Title: bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for
Detecting Homophobia and Transphobia in Social Media Comments
- Authors: Vitthal Bhandari and Poonam Goyal
- Abstract summary: We present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments.
We experiment with a number of monolingual and multilingual transformer based models such as mBERT.
We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well as Tamil.
- Score: 0.9981479937152642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online social networks are ubiquitous and user-friendly. Nevertheless, it is
vital to detect and moderate offensive content to maintain decency and empathy.
However, mining social media texts is a complex task since users don't adhere
to any fixed patterns. Comments can be written in any combination of languages
and many of them may be low-resource.
In this paper, we present our system for the LT-EDI shared task on detecting
homophobia and transphobia in social media comments. We experiment with a
number of monolingual and multilingual transformer based models such as mBERT
along with a data augmentation technique for tackling class imbalance. Such
pretrained large models have recently shown tremendous success on a variety of
benchmark tasks in natural language processing. We observe their performance on
a carefully annotated, real life dataset of YouTube comments in English as well
as Tamil.
Our submission achieved ranks $9$, $6$ and $3$ with a macro-averaged F1-score
of $0.42$, $0.64$ and $0.58$ in the English, Tamil and Tamil-English subtasks
respectively. The code for the system has been open sourced.
Related papers
- Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia
Classifiers with a Multilingual Understanding [0.0]
We present a joint multilingual (M-L) and language-specific (L-S) approach to homophobia and transphobic hate speech detection.
M-L models are needed to catch words, phrases, and concepts that are less common or missing in a particular language.
L-S models are better situated to understand the cultural and linguistic context of the users who typically write in a particular language.
arXiv Detail & Related papers (2023-09-24T06:37:54Z) - Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT)
We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z) - Detection of Homophobia & Transphobia in Dravidian Languages: Exploring
Deep Learning Methods [1.5687561161428403]
Homophobia and transphobia constitute offensive comments against LGBT+ community.
The paper attempts to explore applicability of different deep learning mod-els for classification of the social media comments in Malayalam and Tamil lan-guages.
arXiv Detail & Related papers (2023-04-03T12:15:27Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Toxicity Detection for Indic Multilingual Social Media Content [0.0]
This paper describes the system proposed by team 'Moj Masti' using the data provided by ShareChat/Moj in emphIIIT-D Abusive Comment Identification challenge.
We focus on how we can leverage multilingual transformer based pre-trained and fine-tuned models to approach code-mixed/code-switched classification tasks.
arXiv Detail & Related papers (2022-01-03T12:01:47Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z) - NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from
Multilingual Code-Mixed Text using Transformers [0.0]
This paper presents an automated system that can identify offensive text from multilingual code-mixed data.
datasets provided in three languages including Tamil, Malayalam and Kannada code-mixed with English.
arXiv Detail & Related papers (2021-02-28T11:10:32Z) - Evaluation of Deep Learning Models for Hostility Detection in Hindi Text [2.572404739180802]
We present approaches for hostile text detection in the Hindi language.
The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset.
We evaluate a host of deep learning approaches based on CNN, LSTM, and BERT for this multi-label classification problem.
arXiv Detail & Related papers (2021-01-11T19:10:57Z) - Emergent Communication Pretraining for Few-Shot Machine Translation [66.48990742411033]
We pretrain neural networks via emergent communication from referential games.
Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages.
arXiv Detail & Related papers (2020-11-02T10:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.