Which Code Statements Implement Privacy Behaviors in Android Applications?
- URL: http://arxiv.org/abs/2503.02091v1
- Date: Mon, 03 Mar 2025 22:20:01 GMT
- Title: Which Code Statements Implement Privacy Behaviors in Android Applications?
- Authors: Chia-Yi Su, Aakash Bansal, Vijayanta Jain, Sepideh Ghanavati, Sai Teja Peddinti, Collin McMillan,
- Abstract summary: A "privacy behavior" in software is an action where the software uses personal information for a service or a feature, such as a website using location to provide content relevant to a user.<n>We propose an approach to automatically detect privacy-relevant statements by fine-tuning three large language models with the data from the study.
- Score: 5.723067425160506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A "privacy behavior" in software is an action where the software uses personal information for a service or a feature, such as a website using location to provide content relevant to a user. Programmers are required by regulations or application stores to provide privacy notices and labels describing these privacy behaviors. Although many tools and research prototypes have been developed to help programmers generate these notices by analyzing the source code, these approaches are often fairly coarse-grained (i.e., at the level of whole methods or files, rather than at the statement level). But this is not necessarily how privacy behaviors exist in code. Privacy behaviors are embedded in specific statements in code. Current literature does not examine what statements programmers see as most important, how consistent these views are, or how to detect them. In this paper, we conduct an empirical study to examine which statements programmers view as most-related to privacy behaviors. We find that expression statements that make function calls are most associated with privacy behaviors, while the type of privacy label has little effect on the attributes of the selected statements. We then propose an approach to automatically detect these privacy-relevant statements by fine-tuning three large language models with the data from the study. We observe that the agreement between our approach and participants is comparable to or higher than an agreement between two participants. Our study and detection approach can help programmers understand which statements in code affect privacy in mobile applications.
Related papers
- Exploring User Privacy Awareness on GitHub: An Empirical Study [5.822284390235265]
GitHub provides developers with a practical way to distribute source code and collaborate on common projects.
To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication.
Despite the endless effort, the platform still faces various issues related to the privacy of its users.
arXiv Detail & Related papers (2024-09-06T06:41:46Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Finding Privacy-relevant Source Code [0.0]
We introduce the concept of privacy-relevant methods - specific methods in code that are directly involved in the processing of personal data.
We then present an automated approach to assist in code review by identifying and categorizing these privacy-relevant methods in source code.
For our evaluation, we examined 100 open-source applications and found that our approach identifies fewer than 5% of the methods as privacy-relevant for personal data processing.
arXiv Detail & Related papers (2024-01-14T15:38:29Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Towards Fine-Grained Localization of Privacy Behaviors [5.74186288696419]
PriGen uses static analysis to identify Android applications' code segments that process sensitive information.
We present the initial evaluation of our translation task for 300,000 code segments.
arXiv Detail & Related papers (2023-05-24T16:32:14Z) - PriGen: Towards Automated Translation of Android Applications' Code to
Privacy Captions [4.2534846356464815]
PriGen uses static analysis to identify Android applications' code segments which process sensitive information.
We present the initial evaluation of our translation task for $sim$300,000 code segments.
arXiv Detail & Related papers (2023-05-11T01:14:28Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - SPAct: Self-supervised Privacy Preservation for Action Recognition [73.79886509500409]
Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset.
Recent developments of self-supervised learning (SSL) have unleashed the untapped potential of the unlabeled data.
We present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels.
arXiv Detail & Related papers (2022-03-29T02:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.