Related papers: A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News

A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News

URL: http://arxiv.org/abs/2511.18618v1
Date: Sun, 23 Nov 2025 21:22:56 GMT
Title: A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News
Authors: Mirza Raquib, Munazer Montasir Akash, Tawhid Ahmed, Saydul Akbar Murad, Farida Siddiqi Prity, Mohammad Amzad Hossain, Asif Pervez Polok, Nick Rahimi,
Abstract summary: This research presents a state-of-the-art approach to Bangla news headline classification combined with sentiment analysis.<n>We have explored a dataset called BAN-ABSA of 9014 news headlines, which is the first time that has been experimented with simultaneously in the headline and sentiment categorization.<n>The proposed model BERT-CNN-BiLSTM significantly outperforms all baseline models in classification tasks.
Score: 1.8737506366172099
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portals can be challenging. Newspaper headlines with sentiment analysis tell us what the news is about (e.g., politics, sports) and how the news makes us feel (positive, negative, neutral). This helps us quickly understand the emotional tone of the news. This research presents a state-of-the-art approach to Bangla news headline classification combined with sentiment analysis applying Natural Language Processing (NLP) techniques, particularly the hybrid transfer learning model BERT-CNN-BiLSTM. We have explored a dataset called BAN-ABSA of 9014 news headlines, which is the first time that has been experimented with simultaneously in the headline and sentiment categorization in Bengali newspapers. Over this imbalanced dataset, we applied two experimental strategies: technique-1, where undersampling and oversampling are applied before splitting, and technique-2, where undersampling and oversampling are applied after splitting on the In technique-1 oversampling provided the strongest performance, both headline and sentiment, that is 78.57\% and 73.43\% respectively, while technique-2 delivered the highest result when trained directly on the original imbalanced dataset, both headline and sentiment, that is 81.37\% and 64.46\% respectively. The proposed model BERT-CNN-BiLSTM significantly outperforms all baseline models in classification tasks, and achieves new state-of-the-art results for Bangla news headline classification and sentiment analysis. These results demonstrate the importance of leveraging both the headline and sentiment datasets, and provide a strong baseline for Bangla text classification in low-resource.

Related papers

From Perceptions To Evidence: Detecting AI-Generated Content In Turkish News Media With A Fine-Tuned Bert Classifier [0.0]
This study fine-tuning a Turkish-specific BERT model on a labeled dataset of 3,600 articles from three major Turkish outlets.<n>It reveals consistent cross-source and temporally stable classification patterns, with mean prediction confidence exceeding 0.96.<n>It is the first study to move beyond self-reported journalist perceptions toward empirical, data-driven measurement of AI usage in Turkish news media.
arXiv Detail & Related papers (2026-02-13T22:29:00Z)
Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024 [41.94295877935867]
The proliferation of online news enables potential widespread publication of perceived low-quality news headlines/links.<n>We evaluated twelve machine learning models on a binary, balanced dataset of 57,544,214 worldwide news website links/headings.
arXiv Detail & Related papers (2025-06-11T04:05:57Z)
TeClass: A Human-Annotated Relevance-based Headline Classification and Generation Dataset for Telugu [4.272315504476224]
relevance-based headline classification can greatly aid the task of generating relevant headlines. We present TeClass, the first-ever human-annotated Telugu news headline classification dataset. The headlines generated by the models fine-tuned on highly relevant article-headline pairs, showed about a 5 point increment in the ROUGE-L scores.
arXiv Detail & Related papers (2024-04-17T13:07:56Z)
Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection. We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z)
Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models [0.07696728525672149]
We propose a methodology consisting of four distinct approaches to classify fake news articles in Bengali.<n>Our approach includes translating English news articles and using augmentation techniques to curb the deficit of fake news articles.<n>We show the effectiveness of summarization and augmentation in the case of Bengali fake news detection.
arXiv Detail & Related papers (2023-07-13T14:50:55Z)
Nothing Stands Alone: Relational Fake News Detection with Hypergraph Neural Networks [49.29141811578359]
We propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism. Our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
arXiv Detail & Related papers (2022-12-24T00:19:32Z)
Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z)
UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z)
Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news. We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing. 42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z)
Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda. Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles. Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z)
Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data [5.08128537391027]
We show that BERT does not generalise well when the training and test data are sufficiently dissimilar. We show how to address this problem by providing a statistical measure of similarity between datasets and a method of incorporating cost-weighting into BERT. We achieve the second-highest score on sentence-level propaganda classification.
arXiv Detail & Related papers (2020-03-16T19:10:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.