Written and spoken corpus of real and fake social media postings about
COVID-19
- URL: http://arxiv.org/abs/2310.04237v1
- Date: Fri, 6 Oct 2023 13:21:04 GMT
- Title: Written and spoken corpus of real and fake social media postings about
COVID-19
- Authors: Ng Bee Chin, Ng Zhi Ee Nicole, Kyla Kwan, Lee Yong Han Dylann, Liu
Fang, Xu Hong
- Abstract summary: The data was analysed using the Linguistic Inquiry and Word Count (LIWC) software to detect patterns in linguistic data.
The results indicate a set of linguistic features that distinguish fake news from real news in both written and speech data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study investigates the linguistic traits of fake news and real news.
There are two parts to this study: text data and speech data. The text data for
this study consisted of 6420 COVID-19 related tweets re-filtered from Patwa et
al. (2021). After cleaning, the dataset contained 3049 tweets, with 2161
labeled as 'real' and 888 as 'fake'. The speech data for this study was
collected from TikTok, focusing on COVID-19 related videos. Research assistants
fact-checked each video's content using credible sources and labeled them as
'Real', 'Fake', or 'Questionable', resulting in a dataset of 91 real entries
and 109 fake entries from 200 TikTok videos with a total word count of 53,710
words. The data was analysed using the Linguistic Inquiry and Word Count (LIWC)
software to detect patterns in linguistic data. The results indicate a set of
linguistic features that distinguish fake news from real news in both written
and speech data. This offers valuable insights into the role of language in
shaping trust, social media interactions, and the propagation of fake news.
Related papers
- LTCR: Long-Text Chinese Rumor Detection Dataset [14.503426768310536]
Long-Text Chinese Rumor dataset named LTCR is proposed.
The dataset consists of 1,729 and 500 pieces of real and fake news, respectively.
The average lengths of real and fake news are approximately 230 and 152 characters.
arXiv Detail & Related papers (2023-06-12T16:03:36Z) - Models See Hallucinations: Evaluating the Factuality in Video Captioning [57.85548187177109]
We conduct a human evaluation of the factuality in video captioning and collect two annotated factuality datasets.
We find that 57.0% of the model-generated sentences have factual errors, indicating it is a severe problem in this field.
We propose a weakly-supervised, model-based factuality metric FactVC, which outperforms previous metrics on factuality evaluation of video captioning.
arXiv Detail & Related papers (2023-03-06T08:32:50Z) - ASR2K: Speech Recognition for Around 2000 Languages without Audio [100.41158814934802]
We present a speech recognition pipeline that does not require any audio for the target language.
Our pipeline consists of three components: acoustic, pronunciation, and language models.
We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database.
arXiv Detail & Related papers (2022-09-06T22:48:29Z) - UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu [55.41644538483948]
This study reports the second shared task named as UrduFake@FIRE2021 on identifying fake news detection in Urdu language.
The proposed systems were based on various count-based features and used different classifiers as well as neural network architectures.
The gradient descent (SGD) algorithm outperformed other classifiers and achieved 0.679 F-score.
arXiv Detail & Related papers (2022-07-11T19:15:04Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech
Detection Dataset [0.0]
"AraCOVID19-MFH" is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.
Our dataset contains 10,828 Arabic tweets annotated with 10 different labels.
It can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.
arXiv Detail & Related papers (2021-05-07T09:52:44Z) - Half-Truth: A Partially Fake Audio Detection Dataset [60.08010668752466]
This paper develops a dataset for half-truth audio detection (HAD)
Partially fake audio in the HAD dataset involves only changing a few words in an utterance.
We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset.
arXiv Detail & Related papers (2021-04-08T08:57:13Z) - Hostility Detection and Covid-19 Fake News Detection in Social Media [1.3499391168620467]
We build a model that makes use of an abusive language detector and features extracted via Hindi BERT and Hindi FastText models.
We also build models to identify fake news related to Covid-19 in English tweets.
arXiv Detail & Related papers (2021-01-15T03:24:36Z) - Evaluating Deep Learning Approaches for Covid19 Fake News Detection [0.0]
We look at automated techniques for fake news detection from a data mining perspective.
We evaluate different supervised text classification algorithms on Contraint@AAAI 2021 Covid-19 Fake news detection dataset.
We report the best accuracy of 98.41% on the Covid-19 Fake news detection dataset.
arXiv Detail & Related papers (2021-01-11T16:39:03Z) - FakeCovid -- A Multilingual Cross-domain Fact Check News Dataset for
COVID-19 [0.0]
We present a first multilingual cross-domain dataset of 5182 fact-checked news articles for COVID-19.
We have collected the fact-checked articles from 92 different fact-checking websites after obtaining references from Poynter and Snopes.
The dataset is in 40 languages from 105 countries.
arXiv Detail & Related papers (2020-06-19T19:48:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.