LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource
Sentiment Analysis of Bangla Language
- URL: http://arxiv.org/abs/2311.12735v1
- Date: Tue, 21 Nov 2023 17:21:15 GMT
- Title: LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource
Sentiment Analysis of Bangla Language
- Authors: Aunabil Chakma and Masum Hasan
- Abstract summary: This paper describes the system of the LowResource Team for Task 2 of BLP-2023.
It involves conducting sentiment analysis on a dataset composed of public posts and comments from diverse social media platforms.
Our primary aim is to utilize BanglaBert, a BERT model pre-trained on a large Bangla corpus.
- Score: 0.5922488908114022
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper describes the system of the LowResource Team for Task 2 of
BLP-2023, which involves conducting sentiment analysis on a dataset composed of
public posts and comments from diverse social media platforms. Our primary aim
is to utilize BanglaBert, a BERT model pre-trained on a large Bangla corpus,
using various strategies including fine-tuning, dropping random tokens, and
using several external datasets. Our final model is an ensemble of the three
best BanglaBert variations. Our system has achieved overall 3rd in the Test Set
among 30 participating teams with a score of 0.718. Additionally, we discuss
the promising systems that didn't perform well namely task-adaptive pertaining
and paraphrasing using BanglaT5. Training codes and external datasets which are
used for our system are publicly available at
https://github.com/Aunabil4602/bnlp-workshop-task2-2023
Related papers
- SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes [48.83290963506378]
This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations.
We observe a number of key trends in how this approach was tackled.
While a majority of the teams did outperform our proposed baseline system, the performances of top-scoring systems are still consistent with a random handling of the more challenging items.
arXiv Detail & Related papers (2024-03-12T15:06:22Z) - nlpBDpatriots at BLP-2023 Task 2: A Transfer Learning Approach to Bangla
Sentiment Analysis [7.3481279783709805]
In this paper, we discuss the nlpBDpatriots entry to the shared task on Sentiment Analysis of Bangla Social Media Posts.
The main objective of this task is to identify the polarity of social media content using a Bangla dataset annotated with positive, neutral, and negative labels.
Our best system ranked 12th among 30 teams that participated in the competition.
arXiv Detail & Related papers (2023-11-25T13:58:58Z) - RSM-NLP at BLP-2023 Task 2: Bangla Sentiment Analysis using Weighted and
Majority Voted Fine-Tuned Transformers [2.048226951354646]
This paper describes our approach to submissions made at BLP Workshop - Sentiment Analysis of Bangla Social Media Posts.
Our system scored 0.711 for the multiclass classification task and scored 10th place among the participants on the leaderboard for the shared task.
arXiv Detail & Related papers (2023-10-22T10:55:56Z) - BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models
for Sentiment Analysis of Bangla Social Media Posts [0.46040036610482665]
This paper presents our submission to Task 2 (Sentiment Analysis of Bangla Social Media Posts) of the BLP Workshop.
Our quantitative results show that transfer learning really helps in better learning of the models in this low-resource language scenario.
We obtain a micro-F1 of 67.02% on the test set and our performance in this shared task is ranked at 21 in the leaderboard.
arXiv Detail & Related papers (2023-10-13T16:46:38Z) - BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural
Language Generation in Bangla [21.47743471497797]
This work presents a benchmark for evaluating natural language generation models in Bangla.
We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark.
Using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer model for Bangla.
BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming mT5 (base) by up to 5.4%.
arXiv Detail & Related papers (2022-05-23T06:54:56Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - A Review of Bangla Natural Language Processing Tasks and the Utility of
Transformer Models [2.5768647103950357]
We provide a review of Bangla NLP tasks, resources, and tools available to the research community.
We benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms.
We report our results using both individual and consolidated datasets and provide data for future research.
arXiv Detail & Related papers (2021-07-08T13:49:46Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z) - Investigating Pretrained Language Models for Graph-to-Text Generation [55.55151069694146]
Graph-to-text generation aims to generate fluent texts from graph-based data.
We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs.
We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.
arXiv Detail & Related papers (2020-07-16T16:05:34Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.