Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia
Classifiers with a Multilingual Understanding
- URL: http://arxiv.org/abs/2309.13561v1
- Date: Sun, 24 Sep 2023 06:37:54 GMT
- Title: Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia
Classifiers with a Multilingual Understanding
- Authors: Dean Ninalga
- Abstract summary: We present a joint multilingual (M-L) and language-specific (L-S) approach to homophobia and transphobic hate speech detection.
M-L models are needed to catch words, phrases, and concepts that are less common or missing in a particular language.
L-S models are better situated to understand the cultural and linguistic context of the users who typically write in a particular language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting transphobia, homophobia, and various other forms of hate speech is
difficult. Signals can vary depending on factors such as language, culture,
geographical region, and the particular online platform. Here, we present a
joint multilingual (M-L) and language-specific (L-S) approach to homophobia and
transphobic hate speech detection (HSD). M-L models are needed to catch words,
phrases, and concepts that are less common or missing in a particular language
and subsequently overlooked by L-S models. Nonetheless, L-S models are better
situated to understand the cultural and linguistic context of the users who
typically write in a particular language. Here we construct a simple and
successful way to merge the M-L and L-S approaches through simple weight
interpolation in such a way that is interpretable and data-driven. We
demonstrate our system on task A of the 'Shared Task on Homophobia/Transphobia
Detection in social media comments' dataset for homophobia and transphobic HSD.
Our system achieves the best results in three of five languages and achieves a
0.997 macro average F1-score on Malayalam texts.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with
BERT [0.0]
Cross-linguistic transfer is the influence of linguistic structure of a speaker's native language on the successful acquisition of a foreign language.
We find that NLP literature has not given enough attention to the phenomenon of negative transfer.
Our findings call for further research using our novel Transformer-based SLA models.
arXiv Detail & Related papers (2023-05-31T06:22:07Z) - Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection [23.97444551607624]
Hate speech in social media is a growing phenomenon, and detecting such toxic content has gained significant traction.
HateMAML is a model-agnostic meta-learning-based framework that effectively performs hate speech detection in low-resource languages.
Extensive experiments are conducted on five datasets across eight different low-resource languages.
arXiv Detail & Related papers (2023-03-04T22:28:29Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual
Spoken Language Understanding [74.39024160277809]
We present Global--Local Contrastive Learning Framework (GL-CLeF) to address this shortcoming.
Specifically, we employ contrastive learning, leveraging bilingual dictionaries to construct multilingual views of the same utterance.
GL-CLeF achieves the best performance and successfully pulls representations of similar sentences across languages closer.
arXiv Detail & Related papers (2022-04-18T13:56:58Z) - bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for
Detecting Homophobia and Transphobia in Social Media Comments [0.9981479937152642]
We present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments.
We experiment with a number of monolingual and multilingual transformer based models such as mBERT.
We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well as Tamil.
arXiv Detail & Related papers (2022-03-27T10:15:34Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Cross-lingual hate speech detection based on multilingual
domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning.
Our goal is to determine if knowledge from one particular language can be used to classify other language.
We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z) - Multilingual Contextual Affective Analysis of LGBT People Portrayals in
Wikipedia [34.183132688084534]
Specific lexical choices in narrative text reflect both the writer's attitudes towards people in the narrative and influence the audience's reactions.
We show how word connotations differ across languages and cultures, highlighting the difficulty of generalizing existing English datasets.
We then demonstrate the usefulness of our method by analyzing Wikipedia biography pages of members of the LGBT community across three languages.
arXiv Detail & Related papers (2020-10-21T08:27:36Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.