Challenges in Android Data Disclosure: An Empirical Study
- URL: http://arxiv.org/abs/2601.20459v1
- Date: Wed, 28 Jan 2026 10:33:38 GMT
- Title: Challenges in Android Data Disclosure: An Empirical Study
- Authors: Mugdha Khedkar, Michael Schlichtig, Mohamed Soliman, Eric Bodden,
- Abstract summary: This paper employs an empirical approach to understand developers' experience with Google Play Store's Data Safety Section (DSS) form.<n>We first survey 41 Android developers to understand how they categorize privacy-related data into DSS categories.<n>We complement the survey with an analysis of 172 online developer discussions, capturing the perspectives of 642 additional developers.
- Score: 7.011407021531348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current legal frameworks enforce that Android developers accurately report the data their apps collect. However, large codebases can make this reporting challenging. This paper employs an empirical approach to understand developers' experience with Google Play Store's Data Safety Section (DSS) form. We first survey 41 Android developers to understand how they categorize privacy-related data into DSS categories and how confident they feel when completing the DSS form. To gain a broader and more detailed view of the challenges developers encounter during the process, we complement the survey with an analysis of 172 online developer discussions, capturing the perspectives of 642 additional developers. Together, these two data sources represent insights from 683 developers. Our findings reveal that developers often manually classify the privacy-related data their apps collect into the data categories defined by Google-or, in some cases, omit classification entirely-and rely heavily on existing online resources when completing the form. Moreover, developers are generally confident in recognizing the data their apps collect, yet they lack confidence in translating this knowledge into DSS-compliant disclosures. Key challenges include issues in identifying privacy-relevant data to complete the form, limited understanding of the form, and concerns about app rejection due to discrepancies with Google's privacy requirements. These results underscore the need for clearer guidance and more accessible tooling to support developers in meeting privacy-aware reporting obligations.
Related papers
- How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - DRBench: A Realistic Benchmark for Enterprise Deep Research [81.49694432639406]
DRBench is a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings.<n>We release 15 deep research tasks across 10 domains, such as Sales, Cybersecurity, and Compliance.
arXiv Detail & Related papers (2025-09-30T18:47:20Z) - WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents [57.203515352080295]
We introduce WebExplorer: a systematic data generation approach using model-based exploration and iterative, long-to-short query evolution.<n>Our model supports 128K context length and up to 100 tool calling turns, enabling long-horizon problem solving.<n>As an 8B-sized model, WebExplorer-8B is able to effectively search over an average of 16 turns after RL training.
arXiv Detail & Related papers (2025-09-08T10:07:03Z) - Evaluating Language Model Reasoning about Confidential Information [95.64687778185703]
We study whether language models exhibit contextual robustness, or the capability to adhere to context-dependent safety specifications.<n>We develop a benchmark (PasswordEval) that measures whether language models can correctly determine when a user request is authorized.<n>We find that current open- and closed-source models struggle with this seemingly simple task, and that, perhaps surprisingly, reasoning capabilities do not generally improve performance.
arXiv Detail & Related papers (2025-08-27T15:39:46Z) - MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - Visualizing Privacy-Relevant Data Flows in Android Applications [5.367301239087641]
SliceViz is a tool that analyzes an Android app by slicing all privacy-relevant data sources detected in source code on the back-end.<n>We conducted a user study with 12 participants demonstrating that SliceViz effectively aids developers in identifying privacy-relevant properties in Android apps.
arXiv Detail & Related papers (2025-03-20T18:47:02Z) - Assessing Privacy Compliance of Android Third-Party SDKs [16.975384208528972]
Third-party Software Development Kits (SDKs) are widely adopted in Android app development.<n>This convenience raises substantial concerns about unauthorized access to users' privacy-sensitive information.<n>Our study offers a targeted analysis of user privacy protection among Android third-party SDKs.
arXiv Detail & Related papers (2024-09-16T15:44:43Z) - Do Android App Developers Accurately Report Collection of Privacy-Related Data? [5.863391019411233]
European Union's General Protection Regulation requires vendors to faithfully disclose their apps collect data.
Many Android apps use third-party code for same information is not readily available.
We first expose a multi-layered definition of privacy-related data correctly report collection in Android apps.
We then create a dataset of privacy-sensitive data classes that may be used as input by an Android app.
arXiv Detail & Related papers (2024-09-06T10:05:45Z) - Unpacking Privacy Labels: A Measurement and Developer Perspective on
Google's Data Safety Section [23.183167991569352]
We present a comprehensive analysis of Google's Data Safety Section (DSS) using both quantitative and qualitative methods.
We find that there are internal inconsistencies within the reported practices.
Next, we conduct a longitudinal study of DSS to explore how the reported practices evolve over time.
arXiv Detail & Related papers (2023-06-13T20:01:08Z) - On the Privacy of Mental Health Apps: An Empirical Investigation and its
Implications for Apps Development [14.113922276394588]
This paper reports an empirical study aimed at systematically identifying and understanding data privacy incorporated in mental health apps.
We analyzed 27 top-ranked mental health apps from Google Play Store.
The findings reveal important data privacy issues such as unnecessary permissions, insecure cryptography implementations, and leaks of personal data and credentials in logs and web requests.
arXiv Detail & Related papers (2022-01-22T09:23:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.