Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case
Study of WhatsApp
- URL: http://arxiv.org/abs/2106.05184v3
- Date: Sat, 12 Feb 2022 18:46:12 GMT
- Title: Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case
Study of WhatsApp
- Authors: Pushkal Agarwal, Aravindh Raman, Damilola Ibosiola, Gareth Tyson,
Nishanth Sastry, Kiran Garimella
- Abstract summary: We study junk messaging on a multilingual dataset of 2.6M messages sent to 5K public WhatsApp groups in India.
We find that nearly 1 in 10 messages is unwanted content sent by junk senders.
- Score: 8.463390032361591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: WhatsApp is a popular messaging app used by over a billion users around the
globe. Due to this popularity, understanding misbehavior on WhatsApp is an
important issue. The sending of unwanted junk messages by unknown contacts via
WhatsApp remains understudied by researchers, in part because of the end-to-end
encryption offered by the platform. We address this gap by studying junk
messaging on a multilingual dataset of 2.6M messages sent to 5K public WhatsApp
groups in India. We characterise both junk content and senders. We find that
nearly 1 in 10 messages is unwanted content sent by junk senders, and a number
of unique strategies are employed to reflect challenges faced on WhatsApp,
e.g., the need to change phone numbers regularly. We finally experiment with
on-device classification to automate the detection of junk, whilst respecting
end-to-end encryption.
Related papers
- The Medium is the Message: How Secure Messaging Apps Leak Sensitive Data to Push Notification Services [9.547428690220618]
This study investigated secure messaging apps' usage of Google's Cloud Messaging (FCM) service to send push notifications to Android devices.
We analyzed 21 popular secure messaging apps from the Google Play Store to determine what personal information these apps leak in the payload of push notifications sent via FCM.
None of the data we observed being leaked to FCM was specifically disclosed in those apps' privacy disclosures.
arXiv Detail & Related papers (2024-07-15T10:13:30Z) - WildChat: 1M ChatGPT Interaction Logs in the Wild [88.05964311416717]
WildChat is a corpus of 1 million user-ChatGPT conversations, which consists of over 2.5 million interaction turns.
In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses.
arXiv Detail & Related papers (2024-05-02T17:00:02Z) - WhatsApp Explorer: A Data Donation Tool To Facilitate Research on WhatsApp [1.2507543279181124]
This paper introduces WhatsApp Explorer, a tool designed to enable WhatsApp data collection on a large scale.
We discuss protocols for data collection, including potential sampling approaches, and explain why our tool (and adjoining protocol) arguably allow researchers to collect WhatsApp data in an ethical and legal manner, at scale.
arXiv Detail & Related papers (2024-03-29T13:30:29Z) - TGDataset: a Collection of Over One Hundred Thousand Telegram Channels [69.22187804798162]
This paper presents the TGDataset, a new dataset that includes 120,979 Telegram channels and over 400 million messages.
We analyze the languages spoken within our dataset and the topic covered by English channels.
In addition to the raw dataset, we released the scripts we used to analyze the dataset and the list of channels belonging to the network of a new conspiracy theory called Sabmyk.
arXiv Detail & Related papers (2023-03-09T15:42:38Z) - Analysis of Longitudinal Changes in Privacy Behavior of Android
Applications [79.71330613821037]
In this paper, we examine the trends in how Android apps have changed over time with respect to privacy.
We examine the adoption of HTTPS, whether apps scan the device for other installed apps, the use of permissions for privacy-sensitive data, and the use of unique identifiers.
We find that privacy-related behavior has improved with time as apps continue to receive updates, and that the third-party libraries used by apps are responsible for more issues with privacy.
arXiv Detail & Related papers (2021-12-28T16:21:31Z) - Uncovering the Dark Side of Telegram: Fakes, Clones, Scams, and
Conspiracy Movements [67.39353554498636]
We perform a large-scale analysis of Telegram by collecting 35,382 different channels and over 130,000,000 messages.
We find some of the infamous activities also present on privacy-preserving services of the Dark Web, such as carding.
We propose a machine learning model that is able to identify fake channels with an accuracy of 86%.
arXiv Detail & Related papers (2021-11-26T14:53:31Z) - Tiplines to Combat Misinformation on Encrypted Platforms: A Case Study
of the 2019 Indian Election on WhatsApp [5.342552155591148]
We analyze the usefulness of a crowd-sourced system on WhatsApp through which users can submit "tips" containing messages they want fact-checked.
We compare the tips sent to a WhatsApp tipline run during the 2019 Indian national elections with the messages circulating in large, public groups on WhatsApp.
We find that tiplines are a very useful lens into WhatsApp conversations.
arXiv Detail & Related papers (2021-06-08T23:08:47Z) - A First Look at COVID-19 Messages on WhatsApp in Pakistan [6.336355456383468]
COVID-19 has prompted extensive online discussions, creating an infodemic' on social media platforms such as WhatsApp and Twitter.
We present the first analysis of COVID-19 discourse on public WhatsApp groups from Pakistan.
arXiv Detail & Related papers (2020-11-18T07:56:24Z) - TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy.
It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data.
We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z) - Can WhatsApp Benefit from Debunked Fact-Checked Stories to Reduce
Misinformation? [3.116035935327534]
We observe that misinformation has been largely shared on WhatsApp public groups even after they were already fact-checked by popular fact-checking agencies.
This represents a significant portion of misinformation spread in both Brazil and India in the groups analyzed.
We propose an architecture that could be implemented by WhatsApp to counter such misinformation.
arXiv Detail & Related papers (2020-06-03T18:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.