A Unified System for Aggression Identification in English Code-Mixed and
Uni-Lingual Texts
- URL: http://arxiv.org/abs/2001.05493v2
- Date: Sat, 18 Jan 2020 06:50:54 GMT
- Title: A Unified System for Aggression Identification in English Code-Mixed and
Uni-Lingual Texts
- Authors: Anant Khandelwal, Niraj Kumar
- Abstract summary: We introduce a unified and robust deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset.
The devised system, uses psycho-linguistic features and very ba-sic linguistic features.
Our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.
- Score: 25.15521897068512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Wide usage of social media platforms has increased the risk of aggression,
which results in mental stress and affects the lives of people negatively like
psychological agony, fighting behavior, and disrespect to others. Majority of
such conversations contains code-mixed languages[28]. Additionally, the way
used to express thought or communication style also changes from one social
media plat-form to another platform (e.g., communication styles are different
in twitter and Facebook). These all have increased the complexity of the
problem. To solve these problems, we have introduced a unified and robust
multi-modal deep learning architecture which works for English code-mixed
dataset and uni-lingual English dataset both.The devised system, uses
psycho-linguistic features and very ba-sic linguistic features. Our multi-modal
deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and
Disconnected RNN(with Glove and FastText embedding, both). Finally, the system
takes the decision based on model averaging. We evaluated our system on English
Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from
Kaggle. Experimental results show that our proposed system outperforms all the
previous approaches on English code-mixed dataset and uni-lingual English
dataset.
Related papers
- An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - SADAS: A Dialogue Assistant System Towards Remediating Norm Violations
in Bilingual Socio-Cultural Conversations [56.31816995795216]
Socially-Aware Dialogue Assistant System (SADAS) is designed to ensure that conversations unfold with respect and understanding.
Our system's novel architecture includes: (1) identifying the categories of norms present in the dialogue, (2) detecting potential norm violations, (3) evaluating the severity of these violations, and (4) implementing targeted remedies to rectify the breaches.
arXiv Detail & Related papers (2024-01-29T08:54:21Z) - Leveraging Language Identification to Enhance Code-Mixed Text
Classification [0.7340017786387767]
Existing deep-learning models do not take advantage of the implicit language information in code-mixed text.
Our study aims to improve BERT-based models performance on low-resource Code-Mixed Hindi-English datasets.
arXiv Detail & Related papers (2023-06-08T06:43:10Z) - A Comprehensive Understanding of Code-mixed Language Semantics using
Hierarchical Transformer [28.3684494647968]
We propose a hierarchical transformer-based architecture (HIT) to learn the semantics of code-mixed languages.
We evaluate the proposed method across 6 Indian languages and 9 NLP tasks on 17 datasets.
arXiv Detail & Related papers (2022-04-27T07:50:18Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Integrating Knowledge in End-to-End Automatic Speech Recognition for
Mandarin-English Code-Switching [41.88097793717185]
Code-Switching (CS) is a common linguistic phenomenon in multilingual communities.
This paper presents our investigations on end-to-end speech recognition for Mandarin-English CS speech.
arXiv Detail & Related papers (2021-12-19T17:31:15Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Role of Artificial Intelligence in Detection of Hateful Speech for
Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world.
Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages.
We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z) - ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention
Model for Sentiment Analysis in Code-Mixed Text [1.4926515182392508]
We present the Generative Morphemes with Attention (GenMA) Model sentiment analysis system contributed to SemEval 2020 Task 9 SentiMix.
The system aims to predict the sentiments of the given English-Hindi code-mixed tweets without using word-level language tags.
arXiv Detail & Related papers (2020-07-27T23:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.