Finding Privacy-relevant Source Code
- URL: http://arxiv.org/abs/2401.07316v1
- Date: Sun, 14 Jan 2024 15:38:29 GMT
- Title: Finding Privacy-relevant Source Code
- Authors: Feiyang Tang and Bjarte M. {\O}stvold
- Abstract summary: We introduce the concept of privacy-relevant methods - specific methods in code that are directly involved in the processing of personal data.
We then present an automated approach to assist in code review by identifying and categorizing these privacy-relevant methods in source code.
For our evaluation, we examined 100 open-source applications and found that our approach identifies fewer than 5% of the methods as privacy-relevant for personal data processing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Privacy code review is a critical process that enables developers and legal
experts to ensure compliance with data protection regulations. However, the
task is challenging due to resource constraints. To address this, we introduce
the concept of privacy-relevant methods - specific methods in code that are
directly involved in the processing of personal data. We then present an
automated approach to assist in code review by identifying and categorizing
these privacy-relevant methods in source code.
Using static analysis, we identify a set of methods based on their
occurrences in 50 commonly used libraries. We then rank these methods according
to their frequency of invocation with actual personal data in the top 30 GitHub
applications. The highest-ranked methods are the ones we designate as
privacy-relevant in practice. For our evaluation, we examined 100 open-source
applications and found that our approach identifies fewer than 5% of the
methods as privacy-relevant for personal data processing. This reduces the time
required for code reviews. Case studies on Signal Desktop and Cal.com further
validate the effectiveness of our approach in aiding code reviewers to produce
enhanced reports that facilitate compliance with privacy regulations.
Related papers
- FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments.
We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Helping Code Reviewer Prioritize: Pinpointing Personal Data and its
Processing [0.9238700679836852]
We have designed two specialized views to help code reviewers in prioritizing their work related to personal data.
Our approach, evaluated on four open-source GitHub applications, demonstrated a precision rate of 0.87 in identifying personal data flows.
This solution, designed to augment the efficiency of privacy-related analysis tasks such as the Record of Processing Activities (ROPA), aims to conserve resources, thereby saving time and enhancing productivity for code reviewers.
arXiv Detail & Related papers (2023-06-20T12:30:46Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - Mining User Privacy Concern Topics from App Reviews [10.776958968245589]
An increasing number of users are voicing their privacy concerns through app reviews on App stores.
The main challenge of effectively mining privacy concerns from user reviews lies in the fact that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content.
In this work, we propose a novel automated approach to overcome that challenge.
arXiv Detail & Related papers (2022-12-19T08:07:27Z) - A General Framework for Auditing Differentially Private Machine Learning [27.99806936918949]
We present a framework to statistically audit the privacy guarantee conferred by a differentially private machine learner in practice.
Our work develops a general methodology to empirically evaluate the privacy of differentially private machine learning implementations.
arXiv Detail & Related papers (2022-10-16T21:34:18Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z) - How to keep text private? A systematic review of deep learning methods
for privacy-preserving natural language processing [0.38073142980732994]
Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020.
We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods.
We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
arXiv Detail & Related papers (2022-05-20T11:29:44Z) - Debugging Differential Privacy: A Case Study for Privacy Auditing [60.87570714269048]
We show that auditing can also be used to find flaws in (purportedly) differentially private schemes.
In this case study, we audit a recent open source implementation of a differentially private deep learning algorithm and find, with 99.99999999% confidence, that the implementation does not satisfy the claimed differential privacy guarantee.
arXiv Detail & Related papers (2022-02-24T17:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.