NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection
using Cross-lingual Representation Learner
- URL: http://arxiv.org/abs/2103.00464v1
- Date: Sun, 28 Feb 2021 11:30:52 GMT
- Title: NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection
using Cross-lingual Representation Learner
- Authors: Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque
- Abstract summary: We propose three models to identify hope speech in English, Tamil and Malayalam language.
Our team has achieved $1st$, $2nd$ and $1st$ rank in these three tasks respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, several systems have been developed to regulate the spread
of negativity and eliminate aggressive, offensive or abusive contents from the
online platforms. Nevertheless, a limited number of researches carried out to
identify positive, encouraging and supportive contents. In this work, our goal
is to identify whether a social media post/comment contains hope speech or not.
We propose three distinct models to identify hope speech in English, Tamil and
Malayalam language to serve this purpose. To attain this goal, we employed
various machine learning (support vector machine, logistic regression,
ensemble), deep learning (convolutional neural network + long short term
memory) and transformer (m-BERT, Indic-BERT, XLNet, XLM-Roberta) based methods.
Results indicate that XLM-Roberta outdoes all other techniques by gaining a
weighted $f_1$-score of $0.93$, $0.60$ and $0.85$ respectively for English,
Tamil and Malayalam language. Our team has achieved $1^{st}$, $2^{nd}$ and
$1^{st}$ rank in these three tasks respectively.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - How do Large Language Models Handle Multilingualism? [81.15060972112563]
This study explores how large language models (LLMs) handle multilingualism.
LLMs initially understand the query, converting multilingual inputs into English for task-solving.
In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures.
arXiv Detail & Related papers (2024-02-29T02:55:26Z) - BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion [0.0]
We develop a novel approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, to predict speech acts.
We also show that our model BeAts ($underlinetextbfBe$ngali speech acts recognition using Multimodal $underlinetextbfAt$tention Fu$underlinetextbfs$ion.
arXiv Detail & Related papers (2023-06-05T08:12:17Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Language Is Not All You Need: Aligning Perception with Language Models [110.51362453720458]
We introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context, and follow instructions.
We train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data.
Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP.
We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language
arXiv Detail & Related papers (2023-02-27T18:55:27Z) - XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked
Language Models [100.29953199404905]
We introduce a new approach for scaling to very large multilingual vocabularies by de-emphasizing token sharing between languages with little lexical overlap.
We train XLM-V, a multilingual language model with a one million token vocabulary.
XLM-V is particularly effective on low-resource language tasks and outperforms XLM-R by 11.2% and 5.8% absolute on MasakhaNER and Americas NLI, respectively.
arXiv Detail & Related papers (2023-01-25T09:15:17Z) - bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for
Detecting Homophobia and Transphobia in Social Media Comments [0.9981479937152642]
We present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments.
We experiment with a number of monolingual and multilingual transformer based models such as mBERT.
We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well as Tamil.
arXiv Detail & Related papers (2022-03-27T10:15:34Z) - NLP-CUET@DravidianLangTech-EACL2021: Investigating Visual and Textual
Features to Identify Trolls from Multimodal Social Media Memes [0.0]
A shared task is organized to develop models that can identify trolls from multimodal social media memes.
This work presents a computational model that we have developed as part of our participation in the task.
We investigated the visual and textual features using CNN, VGG16, Inception, Multilingual-BERT, XLM-Roberta, XLNet models.
arXiv Detail & Related papers (2021-02-28T11:36:50Z) - NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from
Multilingual Code-Mixed Text using Transformers [0.0]
This paper presents an automated system that can identify offensive text from multilingual code-mixed data.
datasets provided in three languages including Tamil, Malayalam and Kannada code-mixed with English.
arXiv Detail & Related papers (2021-02-28T11:10:32Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.