Hindi/Bengali Sentiment Analysis Using Transfer Learning and Joint Dual
Input Learning with Self Attention
- URL: http://arxiv.org/abs/2202.05457v1
- Date: Fri, 11 Feb 2022 05:36:11 GMT
- Title: Hindi/Bengali Sentiment Analysis Using Transfer Learning and Joint Dual
Input Learning with Self Attention
- Authors: Shahrukh Khan and Mahnoor Shahid
- Abstract summary: Our work explores how we can effectively use deep neural networks in transfer learning and joint dual input learning settings to effectively classify sentiments and detect hate speech in Hindi and Bengali data.
We use BiLSTM with self attention in joint dual input learning setting where we train a single neural network on Hindi and Bengali dataset simultaneously using their respective embeddings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentiment Analysis typically refers to using natural language processing,
text analysis and computational linguistics to extract affect and emotion based
information from text data. Our work explores how we can effectively use deep
neural networks in transfer learning and joint dual input learning settings to
effectively classify sentiments and detect hate speech in Hindi and Bengali
data. We start by training Word2Vec word embeddings for Hindi \textbf{HASOC
dataset} and Bengali hate speech and then train LSTM and subsequently, employ
parameter sharing based transfer learning to Bengali sentiment classifiers by
reusing and fine-tuning the trained weights of Hindi classifiers with both
classifier being used as baseline in our study. Finally, we use BiLSTM with
self attention in joint dual input learning setting where we train a single
neural network on Hindi and Bengali dataset simultaneously using their
respective embeddings.
Related papers
- Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter [1.5236380958983642]
The study employed a convolutional neural network (CNN) with ensemble transfer learning and a multichannel attention network.
We evaluated the proposed model using the CAMTERdb 3.1.2 data set and achieved 92% accuracy for the raw dataset and 98.00% for the preprocessed dataset.
arXiv Detail & Related papers (2024-08-20T15:51:01Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Share What You Already Know: Cross-Language-Script Transfer and
Alignment for Sentiment Detection in Code-Mixed Data [0.0]
Code-switching entails mixing multiple languages. It is an increasingly occurring phenomenon in social media texts.
Pre-trained multilingual models primarily utilize the data in the native script of the language.
Using the native script for each language can generate better representations of the text owing to the pre-trained knowledge.
arXiv Detail & Related papers (2024-02-07T02:59:18Z) - An exploratory experiment on Hindi, Bengali hate-speech detection and
transfer learning using neural networks [0.0]
This work presents our approach to train a neural network to detect hate-speech texts in Hindi and Bengali.
We also explore how transfer learning can be applied to learning these languages, given that they have the same origin and thus, are similar to some extend.
arXiv Detail & Related papers (2022-01-06T10:13:28Z) - Utilizing Wordnets for Cognate Detection among Indian Languages [50.83320088758705]
We detect cognate word pairs among ten Indian languages with Hindi.
We use deep learning methodologies to predict whether a word pair is cognate or not.
We report improved performance of up to 26%.
arXiv Detail & Related papers (2021-12-30T16:46:28Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Offensive Language and Hate Speech Detection with Deep Learning and
Transfer Learning [1.77356577919977]
We propose an approach to automatically classify tweets into three classes: Hate, offensive and Neither.
We create a class module which contains main functionality including text classification, sentiment checking and text data augmentation.
arXiv Detail & Related papers (2021-08-06T20:59:47Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Anubhuti -- An annotated dataset for emotional analysis of Bengali short
stories [2.3424047967193826]
Anubhuti is the first and largest text corpus for analyzing emotions expressed by writers of Bengali short stories.
We explain the data collection methods, the manual annotation process and the resulting high inter-annotator agreement.
We have verified the performance of our dataset with baseline Machine Learning and a Deep Learning model for emotion classification.
arXiv Detail & Related papers (2020-10-06T22:33:58Z) - ALICE: Active Learning with Contrastive Natural Language Explanations [69.03658685761538]
We propose Active Learning with Contrastive Explanations (ALICE) to improve data efficiency in learning.
ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations.
It extracts knowledge from these explanations using a semantically extracted knowledge.
arXiv Detail & Related papers (2020-09-22T01:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.