BanFakeNews: A Dataset for Detecting Fake News in Bangla
- URL: http://arxiv.org/abs/2004.08789v1
- Date: Sun, 19 Apr 2020 07:42:22 GMT
- Title: BanFakeNews: A Dataset for Detecting Fake News in Bangla
- Authors: Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam, Sudipta Kar
- Abstract summary: We propose an annotated dataset of 50K news that can be used for building automated fake news detection systems.
We develop a benchmark system with state of the art NLP techniques to identify Bangla fake news.
- Score: 1.4170999534105675
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Observing the damages that can be done by the rapid propagation of fake news
in various sectors like politics and finance, automatic identification of fake
news using linguistic analysis has drawn the attention of the research
community. However, such methods are largely being developed for English where
low resource languages remain out of the focus. But the risks spawned by fake
and manipulative news are not confined by languages. In this work, we propose
an annotated dataset of ~50K news that can be used for building automated fake
news detection systems for a low resource language like Bangla. Additionally,
we provide an analysis of the dataset and develop a benchmark system with state
of the art NLP techniques to identify Bangla fake news. To create this system,
we explore traditional linguistic features and neural network based methods. We
expect this dataset will be a valuable resource for building technologies to
prevent the spreading of fake news and contribute in research with low resource
languages.
Related papers
- Detection of Human and Machine-Authored Fake News in Urdu [2.013675429941823]
Social media has amplified the spread of fake news.
Traditional fake news detection methods relying on linguistic cues become less effective.
We propose a hierarchical detection strategy to improve the accuracy and robustness.
arXiv Detail & Related papers (2024-10-25T12:42:07Z) - MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages [0.4062349563818079]
We introduce the Multimodal Multilingual dataset for Indic Fake News Detection (MMIFND)
This meticulously curated dataset consists of 28,085 instances distributed across Hindi, Bengali, Marathi, Malayalam, Tamil, Gujarati and Punjabi.
We propose the Multimodal Caption-aware framework for Fake News Detection (MMCFND)
arXiv Detail & Related papers (2024-10-14T11:59:33Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese [0.6775616141339018]
This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese.
We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec.
We develop a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity.
arXiv Detail & Related papers (2023-09-20T04:10:03Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language.
The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing.
The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.