Related papers: BanFakeNews: A Dataset for Detecting Fake News in Bangla

BanFakeNews: A Dataset for Detecting Fake News in Bangla

URL: http://arxiv.org/abs/2004.08789v1
Date: Sun, 19 Apr 2020 07:42:22 GMT
Title: BanFakeNews: A Dataset for Detecting Fake News in Bangla
Authors: Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam, Sudipta Kar
Abstract summary: We propose an annotated dataset of 50K news that can be used for building automated fake news detection systems. We develop a benchmark system with state of the art NLP techniques to identify Bangla fake news.
Score: 1.4170999534105675
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Observing the damages that can be done by the rapid propagation of fake news in various sectors like politics and finance, automatic identification of fake news using linguistic analysis has drawn the attention of the research community. However, such methods are largely being developed for English where low resource languages remain out of the focus. But the risks spawned by fake and manipulative news are not confined by languages. In this work, we propose an annotated dataset of ~50K news that can be used for building automated fake news detection systems for a low resource language like Bangla. Additionally, we provide an analysis of the dataset and develop a benchmark system with state of the art NLP techniques to identify Bangla fake news. To create this system, we explore traditional linguistic features and neural network based methods. We expect this dataset will be a valuable resource for building technologies to prevent the spreading of fake news and contribute in research with low resource languages.

Related papers

From Scarcity to Capability: Empowering Fake News Detection in Low-Resource Languages with LLMs [0.6086698578975425]
BanFakeNews-2.0 is a robust dataset to enhance Bangla fake news detection. This version includes 11,700 additional, meticulously curated fake news articles validated from credible sources. In addition, we created a manually curated independent test set of 460 fake and 540 authentic news items.
arXiv Detail & Related papers (2025-01-16T15:24:41Z)
Detection of Human and Machine-Authored Fake News in Urdu [2.013675429941823]
Social media has amplified the spread of fake news. Traditional fake news detection methods relying on linguistic cues become less effective. We propose a hierarchical detection strategy to improve the accuracy and robustness.
arXiv Detail & Related papers (2024-10-25T12:42:07Z)
MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages [0.4062349563818079]
We introduce the Multimodal Multilingual dataset for Indic Fake News Detection (MMIFND) This meticulously curated dataset consists of 28,085 instances distributed across Hindi, Bengali, Marathi, Malayalam, Tamil, Gujarati and Punjabi. We propose the Multimodal Caption-aware framework for Fake News Detection (MMCFND)
arXiv Detail & Related papers (2024-10-14T11:59:33Z)
Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news. Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z)
fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese [0.6775616141339018]
This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec. We develop a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity.
arXiv Detail & Related papers (2023-09-20T04:10:03Z)
Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z)
UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z)
Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news. We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing. 42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z)
Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda. Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles. Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z)
User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals. We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.