Related papers: Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

URL: http://arxiv.org/abs/2204.09675v1
Date: Tue, 19 Apr 2022 18:55:18 GMT
Title: Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil
Authors: Shantanu Patankar, Omkar Gokhale, Onkar Litake, Aditya Mandke, Dipali Kadam
Abstract summary: This paper tries to address the problem of abusive comment detection in low-resource indic languages. This task detects and classifies YouTube comments in Tamil and Tamil- English Codemixed format into multiple categories.
Score: 1.0066310107046081
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team - Optimize_Prime, in the ACL 2022 shared task "Abusive Comment Detection in Tamil." This task detects and classifies YouTube comments in Tamil and Tamil- English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43. Furthermore, for the Code-mixed data, MuRIL and M-BERT provided sub-lime results, with a macro-averaged f1 score of 0.45.

Related papers

YouTube Comments Decoded: Leveraging LLMs for Low Resource Language Classification [0.0]
We introduce a novel gold standard corpus designed for sarcasm and sentiment detection within code-mixed texts. The primary objective of this task is to identify sarcasm and sentiment polarity within a code-mixed dataset of Tamil-English and Malayalam-English comments and posts collected from social media platforms. We experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify comments into sarcastic or non-sarcastic categories.
arXiv Detail & Related papers (2024-11-06T17:58:01Z)
Understanding writing style in social media with a supervised contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation. We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts. Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z)
Analyzing Norm Violations in Live-Stream Chat [49.120561596550395]
We study the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms. We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch. Our results show that appropriate contextual information can boost moderation performance by 35%.
arXiv Detail & Related papers (2023-05-18T05:58:27Z)
Detection of Homophobia & Transphobia in Dravidian Languages: Exploring Deep Learning Methods [1.5687561161428403]
Homophobia and transphobia constitute offensive comments against LGBT+ community. The paper attempts to explore applicability of different deep learning mod-els for classification of the social media comments in Malayalam and Tamil lan-guages.
arXiv Detail & Related papers (2023-04-03T12:15:27Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language. We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening. For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z)
Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil [1.0066310107046081]
This paper aims to perform an emotion analysis of social media comments in Tamil. The task aimed to classify social media comments into categories of emotion like Joy, Anger, Trust, Disgust, etc.
arXiv Detail & Related papers (2022-04-19T18:47:18Z)
bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments [0.9981479937152642]
We present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments. We experiment with a number of monolingual and multilingual transformer based models such as mBERT. We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well as Tamil.
arXiv Detail & Related papers (2022-03-27T10:15:34Z)
COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences. We also propose textscCOLDetector to study output offensiveness of popular Chinese language models. Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z)
Toxicity Detection for Indic Multilingual Social Media Content [0.0]
This paper describes the system proposed by team 'Moj Masti' using the data provided by ShareChat/Moj in emphIIIT-D Abusive Comment Identification challenge. We focus on how we can leverage multilingual transformer based pre-trained and fine-tuned models to approach code-mixed/code-switched classification tasks.
arXiv Detail & Related papers (2022-01-03T12:01:47Z)
NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message. We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z)
Keystroke Biometrics in Response to Fake News Propagation in a Global Pandemic [77.79066811371978]
This work proposes and analyzes the use of keystroke biometrics for content de-anonymization. Fake news have become a powerful tool to manipulate public opinion, especially during major events.
arXiv Detail & Related papers (2020-05-15T17:56:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.