Transformer-based Text Classification on Unified Bangla Multi-class
Emotion Corpus
- URL: http://arxiv.org/abs/2210.06405v3
- Date: Tue, 13 Jun 2023 07:18:26 GMT
- Title: Transformer-based Text Classification on Unified Bangla Multi-class
Emotion Corpus
- Authors: Md Sakib Ullah Sourav, Huidong Wang, Mohammad Sultan Mahmud, Hua Zheng
- Abstract summary: We provide a Bangla emotion classifier for six classes: anger, disgust, fear, joy, sadness, and surprise.
The Unified Bangla Multi-class Emotion Corpus (UBMEC) is used to assess the performance of our models.
- Score: 0.9285295512807726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this research, we propose a complete set of approaches for identifying and
extracting emotions from Bangla texts. We provide a Bangla emotion classifier
for six classes: anger, disgust, fear, joy, sadness, and surprise, from Bangla
words using transformer-based models, which exhibit phenomenal results in
recent days, especially for high-resource languages. The Unified Bangla
Multi-class Emotion Corpus (UBMEC) is used to assess the performance of our
models. UBMEC is created by combining two previously released manually labeled
datasets of Bangla comments on six emotion classes with fresh manually labeled
Bangla comments created by us. The corpus dataset and code we used in this work
are publicly available.
Related papers
- MONOVAB : An Annotated Corpus for Bangla Multi-label Emotion Detection [0.0]
Sentiment Analysis (SA) and Emotion Recognition (ER) have been increasingly popular in the Bangla language.
However, the language is structurally complicated, which makes this field arduous to extract emotions in an accurate manner.
This study demonstrates a thorough method for constructing an annotated corpus based on scrapped data from Facebook.
arXiv Detail & Related papers (2023-09-27T14:10:57Z) - Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
We use a diffusion model to synthesize high-quality images conditioned on the input texts.
We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z) - BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural
Language Generation in Bangla [21.47743471497797]
This work presents a benchmark for evaluating natural language generation models in Bangla.
We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark.
Using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer model for Bangla.
BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming mT5 (base) by up to 5.4%.
arXiv Detail & Related papers (2022-05-23T06:54:56Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - Fine-Grained Image Generation from Bangla Text Description using
Attentional Generative Adversarial Network [0.0]
We propose Bangla Attentional Generative Adversarial Network (AttnGAN) that allows intensified, multi-stage processing for high-resolution Bangla text-to-image generation.
For the first time, a fine-grained image is generated from Bangla text using attentional GAN.
arXiv Detail & Related papers (2021-09-24T05:31:01Z) - End-to-End Natural Language Understanding Pipeline for Bangla
Conversational Agents [0.43012765978447565]
We propose a novel approach to build a business assistant which can communicate in Bangla and Bangla Transliteration in English with high confidence consistently.
We use Rasa Open Source Framework, fastText embeddings, Polyglot embeddings, Flask, and other systems as building blocks.
We present a pipeline for intent classification and entity extraction which achieves reasonable performance.
arXiv Detail & Related papers (2021-07-12T16:09:22Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Sentiment Classification in Bangla Textual Content: A Comparative Study [4.2394281761764]
In this study, we explore several publicly available sentiment labeled datasets and designed classifiers using both classical and deep learning algorithms.
Our finding suggests transformer-based models, which have not been explored earlier for Bangla, outperform all other models.
arXiv Detail & Related papers (2020-11-19T21:06:28Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.