WhatsApp Explorer: A Data Donation Tool To Facilitate Research on WhatsApp
- URL: http://arxiv.org/abs/2404.01328v1
- Date: Fri, 29 Mar 2024 13:30:29 GMT
- Title: WhatsApp Explorer: A Data Donation Tool To Facilitate Research on WhatsApp
- Authors: Kiran Garimella, Simon Chauchard,
- Abstract summary: This paper introduces WhatsApp Explorer, a tool designed to enable WhatsApp data collection on a large scale.
We discuss protocols for data collection, including potential sampling approaches, and explain why our tool (and adjoining protocol) arguably allow researchers to collect WhatsApp data in an ethical and legal manner, at scale.
- Score: 1.2507543279181124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, reports and anecdotal evidence pointing at the role of WhatsApp in a variety of events, ranging from elections to collective violence, have emerged. While academic research should examine the validity of these claims, obtaining WhatsApp data for research is notably challenging, contrasting with the relative abundance of data from platforms like Facebook and Twitter, where user "information diets" have been extensively studied. This lack of data is particularly problematic since misinformation and hate speech are major concerns in the set of Global South countries in which WhatsApp dominates the market for messaging. To help make research on these questions, and more generally research on WhatsApp, possible, this paper introduces WhatsApp Explorer, a tool designed to enable WhatsApp data collection on a large scale. We discuss protocols for data collection, including potential sampling approaches, and explain why our tool (and adjoining protocol) arguably allow researchers to collect WhatsApp data in an ethical and legal manner, at scale.
Related papers
- WildChat: 1M ChatGPT Interaction Logs in the Wild [88.05964311416717]
WildChat is a corpus of 1 million user-ChatGPT conversations, which consists of over 2.5 million interaction turns.
In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses.
arXiv Detail & Related papers (2024-05-02T17:00:02Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - TGDataset: a Collection of Over One Hundred Thousand Telegram Channels [69.22187804798162]
This paper presents the TGDataset, a new dataset that includes 120,979 Telegram channels and over 400 million messages.
We analyze the languages spoken within our dataset and the topic covered by English channels.
In addition to the raw dataset, we released the scripts we used to analyze the dataset and the list of channels belonging to the network of a new conspiracy theory called Sabmyk.
arXiv Detail & Related papers (2023-03-09T15:42:38Z) - ArgSciChat: A Dataset for Argumentative Dialogues on Scientific Papers [61.772582143035606]
We introduce a novel framework to collect dialogues between scientists as domain experts on scientific papers.
Our framework lets scientists present their scientific papers as groundings for dialogues and participate in dialogue they like its paper title.
We use our framework to collect a novel argumentative dialogue dataset, ArgSciChat. It consists of 498 messages collected from 41 dialogues on 20 scientific papers.
arXiv Detail & Related papers (2022-02-14T13:27:19Z) - Tiplines to Combat Misinformation on Encrypted Platforms: A Case Study
of the 2019 Indian Election on WhatsApp [5.342552155591148]
We analyze the usefulness of a crowd-sourced system on WhatsApp through which users can submit "tips" containing messages they want fact-checked.
We compare the tips sent to a WhatsApp tipline run during the 2019 Indian national elections with the messages circulating in large, public groups on WhatsApp.
We find that tiplines are a very useful lens into WhatsApp conversations.
arXiv Detail & Related papers (2021-06-08T23:08:47Z) - Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case
Study of WhatsApp [8.463390032361591]
We study junk messaging on a multilingual dataset of 2.6M messages sent to 5K public WhatsApp groups in India.
We find that nearly 1 in 10 messages is unwanted content sent by junk senders.
arXiv Detail & Related papers (2021-06-08T15:52:46Z) - PoliWAM: An Exploration of a Large Scale Corpus of Political Discussions
on WhatsApp Messenger [1.2301855531996841]
WhatsApp Messenger is one of the most popular channels for spreading information with a current reach of more than 180 countries and 2 billion people.
In the recent past, several countries have witnessed its effectiveness and influence in political and social campaigns.
We observe a high surge in information and propaganda flow during election campaigning.
arXiv Detail & Related papers (2020-10-26T00:35:57Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z) - Can WhatsApp Benefit from Debunked Fact-Checked Stories to Reduce
Misinformation? [3.116035935327534]
We observe that misinformation has been largely shared on WhatsApp public groups even after they were already fact-checked by popular fact-checking agencies.
This represents a significant portion of misinformation spread in both Brazil and India in the groups analyzed.
We propose an architecture that could be implemented by WhatsApp to counter such misinformation.
arXiv Detail & Related papers (2020-06-03T18:28:57Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z) - A Dataset of Fact-Checked Images Shared on WhatsApp During the Brazilian
and Indian Elections [4.512596331783666]
A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fake stories.
This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp.
arXiv Detail & Related papers (2020-05-05T19:08:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.