Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024
Election Ahead of Time
- URL: http://arxiv.org/abs/2312.03750v2
- Date: Sat, 6 Jan 2024 17:29:12 GMT
- Title: Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024
Election Ahead of Time
- Authors: Shaina Raza, Mizanur Rahman, Shardul Ghuge
- Abstract summary: Despite increasing awareness and research around fake news, there is still a significant need for datasets that specifically target racial slurs and biases within North American political speeches.
This study introduces a comprehensive dataset that illuminates these critical aspects of misinformation.
- Score: 7.1970442944315245
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite increasing awareness and research around fake news, there is still a
significant need for datasets that specifically target racial slurs and biases
within North American political speeches. This is particulary important in the
context of upcoming North American elections. This study introduces a
comprehensive dataset that illuminates these critical aspects of
misinformation. To develop this fake news dataset, we scraped and built a
corpus of 40,000 news articles about political discourses in North America. A
portion of this dataset (4000) was then carefully annotated, using a blend of
advanced language models and human verification methods. We have made both
these datasets openly available to the research community and have conducted
benchmarking on the annotated data to demonstrate its utility. We release the
best-performing language model along with data. We encourage researchers and
developers to make use of this dataset and contribute to this ongoing
initiative.
Related papers
- Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles [4.895830603263421]
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets.
It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project.
Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages.
arXiv Detail & Related papers (2024-06-18T13:43:22Z) - Open the Data! Chuvash Datasets [50.59120569845975]
We introduce four comprehensive datasets for the Chuvash language.
These datasets include a monolingual dataset, a parallel dataset with Russian, a parallel dataset with English, and an audio dataset.
arXiv Detail & Related papers (2024-05-31T07:51:19Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection [9.20397189600732]
This research explores the problem of deception through the lens of psychology.
We propose a novel framework for deception detection leveraging NLP techniques.
We present a novel multi-task learning pipeline that leverages the dataless merging of fine-tuned language models.
arXiv Detail & Related papers (2023-12-01T02:13:25Z) - When a Language Question Is at Stake. A Revisited Approach to Label
Sensitive Content [0.0]
Article revisits an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war.
We provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus.
arXiv Detail & Related papers (2023-11-17T13:35:10Z) - Deepfake audio as a data augmentation technique for training automatic
speech to text transcription models [55.2480439325792]
We propose a framework that approaches data augmentation based on deepfake audio.
A dataset produced by Indians (in English) was selected, ensuring the presence of a single accent.
arXiv Detail & Related papers (2023-09-22T11:33:03Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - Identifying Informational Sources in News Articles [109.70475599552523]
We build the largest and widest-ranging annotated dataset of informational sources used in news writing.
We introduce a novel task, source prediction, to study the compositionality of sources in news articles.
arXiv Detail & Related papers (2023-05-24T08:56:35Z) - Mitigation of Diachronic Bias in Fake News Detection Dataset [3.2800968305157205]
Most of the fake news datasets depend on a specific time period.
The detection models trained on such a dataset have difficulty detecting novel fake news generated by political changes and social changes.
We propose masking methods using Wikidata to mitigate the influence of person names and validate whether they make fake news detection models robust.
arXiv Detail & Related papers (2021-08-28T08:25:29Z) - An open access NLP dataset for Arabic dialects : Data collection,
labeling, and model construction [0.8312466807725921]
We present an open data set of social data content in several Arabic dialects.
This data was collected from the Twitter social network and consists on +50K twits in five (5) national dialects.
We publish this data as an open access data to encourage innovation and encourage other works in the field of NLP for Arabic dialects and social media.
arXiv Detail & Related papers (2021-02-07T01:39:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.