Do Android App Developers Accurately Report Collection of Privacy-Related Data?
- URL: http://arxiv.org/abs/2409.04167v1
- Date: Fri, 6 Sep 2024 10:05:45 GMT
- Title: Do Android App Developers Accurately Report Collection of Privacy-Related Data?
- Authors: Mugdha Khedkar, Ambuj Kumar Mondal, Eric Bodden,
- Abstract summary: European Union's General Protection Regulation requires vendors to faithfully disclose their apps collect data.
Many Android apps use third-party code for same information is not readily available.
We first expose a multi-layered definition of privacy-related data correctly report collection in Android apps.
We then create a dataset of privacy-sensitive data classes that may be used as input by an Android app.
- Score: 5.863391019411233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many Android applications collect data from users. The European Union's General Data Protection Regulation (GDPR) requires vendors to faithfully disclose which data their apps collect. This task is complicated because many apps use third-party code for which the same information is not readily available. Hence we ask: how accurately do current Android apps fulfill these requirements? In this work, we first expose a multi-layered definition of privacy-related data to correctly report data collection in Android apps. We further create a dataset of privacy-sensitive data classes that may be used as input by an Android app. This dataset takes into account data collected both through the user interface and system APIs. We manually examine the data safety sections of 70 Android apps to observe how data collection is reported, identifying instances of over- and under-reporting. Additionally, we develop a prototype to statically extract and label privacy-related data collected via app source code, user interfaces, and permissions. Comparing the prototype's results with the data safety sections of 20 apps reveals reporting discrepancies. Using the results from two Messaging and Social Media apps (Signal and Instagram), we discuss how app developers under-report and over-report data collection, respectively, and identify inaccurately reported data categories. Our results show that app developers struggle to accurately report data collection, either due to Google's abstract definition of collected data or insufficient existing tool support.
Related papers
- How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users [50.699390248359265]
Browser fingerprinting can be used to identify and track users across the Web, even without cookies.
This technique and resulting privacy risks have been studied for over a decade.
We provide a first-of-its-kind dataset to enable further research.
arXiv Detail & Related papers (2024-10-09T14:51:58Z) - Data Exposure from LLM Apps: An In-depth Investigation of OpenAI's GPTs [17.433387980578637]
This paper aims to bring transparency in data practices of LLM apps.
We study OpenAI's GPT app ecosystem.
We find that Actions collect expansive data about users, including sensitive information prohibited by OpenAI, such as passwords.
arXiv Detail & Related papers (2024-08-23T17:42:06Z) - User Interaction Data in Apps: Comparing Policy Claims to
Implementations [0.0]
We analyzed the top 100 apps across diverse categories using static analysis methods to evaluate the alignment between policy claims and implemented data collection techniques.
Our findings highlight the lack of transparency in data collection and the associated risk of re-identification, raising concerns about user privacy and trust.
arXiv Detail & Related papers (2023-12-05T12:11:11Z) - Transparency in App Analytics: Analyzing the Collection of User
Interaction Data [0.0]
We conducted an analysis of the top 20 analytic libraries for Android apps to identify common practices of interaction data collection.
We developed a standardized collection claim template for summarizing an app's data collection practices.
arXiv Detail & Related papers (2023-06-20T11:01:27Z) - Stop Uploading Test Data in Plain Text: Practical Strategies for
Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination.
We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z) - The Overview of Privacy Labels and their Compatibility with Privacy
Policies [24.871967983289117]
Privacy nutrition labels provide a way to understand an app's key data practices without reading the long and hard-to-read privacy policies.
Apple and Google have implemented mandates requiring app developers to fill privacy nutrition labels highlighting their privacy practices.
arXiv Detail & Related papers (2023-03-14T20:10:28Z) - Analysis of Longitudinal Changes in Privacy Behavior of Android
Applications [79.71330613821037]
In this paper, we examine the trends in how Android apps have changed over time with respect to privacy.
We examine the adoption of HTTPS, whether apps scan the device for other installed apps, the use of permissions for privacy-sensitive data, and the use of unique identifiers.
We find that privacy-related behavior has improved with time as apps continue to receive updates, and that the third-party libraries used by apps are responsible for more issues with privacy.
arXiv Detail & Related papers (2021-12-28T16:21:31Z) - The Problem of Zombie Datasets:A Framework For Deprecating Datasets [55.878249096379804]
We examine the public afterlives of several prominent datasets, including ImageNet, 80 Million Tiny Images, MS-Celeb-1M, Duke MTMC, Brainwash, and HRT Transgender.
We propose a dataset deprecation framework that includes considerations of risk, mitigation of impact, appeal mechanisms, timeline, post-deprecation protocol, and publication checks.
arXiv Detail & Related papers (2021-10-18T20:13:51Z) - Mobile Sensing for Multipurpose Applications in Transportation [0.0]
The State Departments of Transportation struggles to collect consistent data for analyzing and resolving transportation problems in a timely manner.
Recent advancements in the sensors integrated into smartphones have resulted in a more affordable method of data collection.
The developed app was evaluated by collecting data on the i70W highway connecting Columbia, Missouri, and Kansas City, Missouri.
arXiv Detail & Related papers (2021-06-20T17:56:12Z) - Mind the GAP: Security & Privacy Risks of Contact Tracing Apps [75.7995398006171]
Google and Apple have jointly provided an API for exposure notification in order to implement decentralized contract tracing apps using Bluetooth Low Energy.
We demonstrate that in real-world scenarios the GAP design is vulnerable to (i) profiling and possibly de-anonymizing persons, and (ii) relay-based wormhole attacks that basically can generate fake contacts.
arXiv Detail & Related papers (2020-06-10T16:05:05Z) - Learning with Weak Supervision for Email Intent Detection [56.71599262462638]
We propose to leverage user actions as a source of weak supervision to detect intents in emails.
We develop an end-to-end robust deep neural network model for email intent identification.
arXiv Detail & Related papers (2020-05-26T23:41:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.