Approaches for Improving the Performance of Fake News Detection in
Bangla: Imbalance Handling and Model Stacking
- URL: http://arxiv.org/abs/2203.11486v1
- Date: Tue, 22 Mar 2022 06:33:01 GMT
- Title: Approaches for Improving the Performance of Fake News Detection in
Bangla: Imbalance Handling and Model Stacking
- Authors: Md Muzakker Hossain, Zahin Awosaf, Md. Salman Hossan Prottoy, Abu
Saleh Muhammod Alvy, Md. Kishor Morol
- Abstract summary: Imbalanced datasets can lead to biasedness into the detection of fake news.
We present several strategies for resolving the imbalance issue for fake news detection in Bangla.
We also propose a technique for improving performance even when the dataset is imbalanced.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Imbalanced datasets can lead to biasedness into the detection of fake news.
In this work, we present several strategies for resolving the imbalance issue
for fake news detection in Bangla with a comparative assessment of proposed
methodologies. Additionally, we propose a technique for improving performance
even when the dataset is imbalanced. We applied our proposed approaches to
BanFakeNews, a dataset developed for the purpose of detecting fake news in
Bangla comprising of 50K instances but is significantly skewed, with 97% of
majority instances. We obtained a 93.1% F1-score using data manipulation
manipulation techniques such as SMOTE, and a 79.1% F1-score using without data
manipulation approaches such as Stacked Generalization. Without implementing
these techniques, the F1-score would have been 67.6% for baseline models. We
see this work as an important step towards paving the way of fake news
detection in Bangla. By implementing these strategies the obstacles of
imbalanced dataset can be removed and improvement in the performance can be
achieved.
Related papers
- Breaking the Fake News Barrier: Deep Learning Approaches in Bangla Language [0.0]
This ponder presents a strategy that utilizes a profound learning innovation, particularly the Gated Repetitive Unit (GRU) to recognize fake news within the Bangla dialect.
The strategy of our proposed work incorporates intensive information preprocessing, which includes tlemmaization, tokenization, and tending to course awkward nature by oversampling.
The performance of the model is investigated by reliable metrics like precision, recall, F1 score, and accuracy.
arXiv Detail & Related papers (2025-01-30T21:41:26Z) - SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples [54.760757107700755]
Semi-Supervised Learning (SSL) can leverage abundant unlabeled data to boost model performance.
The class-imbalanced data distribution in real-world scenarios poses great challenges to SSL, resulting in performance degradation.
We propose a method that enhances the performance of Imbalanced Semi-Supervised Learning by Mining Hard Examples (SeMi)
arXiv Detail & Related papers (2025-01-10T14:35:16Z) - Background Noise Reduction of Attention Map for Weakly Supervised Semantic Segmentation [0.0]
This paper focuses on addressing the issue of background noise in attention weights within the existing WSSS method based on Conformer, known as TransCAM.
The proposed method successfully reduces background noise, leading to improved accuracy of pseudo labels.
arXiv Detail & Related papers (2024-04-04T11:53:37Z) - A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification [61.473485511491795]
Semi-supervised learning (SSL) is a practical challenge in computer vision.
Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL.
We propose a lightweight channel-based ensemble method to consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one.
arXiv Detail & Related papers (2024-03-27T09:49:37Z) - Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models [0.0]
We propose a methodology consisting of four distinct approaches to classify fake news articles in Bengali.
Our approach includes translating English news articles and using augmentation techniques to curb the deficit of fake news articles.
We show the effectiveness of summarization and augmentation in the case of Bengali fake news detection.
arXiv Detail & Related papers (2023-07-13T14:50:55Z) - FedVal: Different good or different bad in federated learning [9.558549875692808]
Federated learning (FL) systems are susceptible to attacks from malicious actors.
FL poses new challenges in addressing group bias, such as ensuring fair performance for different demographic groups.
Traditional methods used to address such biases require centralized access to the data, which FL systems do not have.
We present a novel approach FedVal for both robustness and fairness that does not require any additional information from clients.
arXiv Detail & Related papers (2023-06-06T22:11:13Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - DEAP-FAKED: Knowledge Graph based Approach for Fake News Detection [0.04834203844100679]
We propose a knowleDgE grAPh FAKe nEws Detection framework for identifying Fake News.
Our approach is a combination of the NLP -- where we encode the news content, and the GNN technique -- where we encode the Knowledge Graph.
We evaluate our framework using two publicly available datasets containing articles from domains such as politics, business, technology, and healthcare.
arXiv Detail & Related papers (2021-07-04T07:09:59Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.