Helping Code Reviewer Prioritize: Pinpointing Personal Data and its
Processing
- URL: http://arxiv.org/abs/2306.11495v1
- Date: Tue, 20 Jun 2023 12:30:46 GMT
- Title: Helping Code Reviewer Prioritize: Pinpointing Personal Data and its
Processing
- Authors: Feiyang Tang, Bjarte M. {\O}stvold, Magiel Bruntink
- Abstract summary: We have designed two specialized views to help code reviewers in prioritizing their work related to personal data.
Our approach, evaluated on four open-source GitHub applications, demonstrated a precision rate of 0.87 in identifying personal data flows.
This solution, designed to augment the efficiency of privacy-related analysis tasks such as the Record of Processing Activities (ROPA), aims to conserve resources, thereby saving time and enhancing productivity for code reviewers.
- Score: 0.9238700679836852
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Ensuring compliance with the General Data Protection Regulation (GDPR) is a
crucial aspect of software development. This task, due to its time-consuming
nature and requirement for specialized knowledge, is often deferred or
delegated to specialized code reviewers. These reviewers, particularly when
external to the development organization, may lack detailed knowledge of the
software under review, necessitating the prioritization of their resources.
To address this, we have designed two specialized views of a codebase to help
code reviewers in prioritizing their work related to personal data: one view
displays the types of personal data representation, while the other provides an
abstract depiction of personal data processing, complemented by an optional
detailed exploration of specific code snippets. Leveraging static analysis, our
method identifies personal data-related code segments, thereby expediting the
review process. Our approach, evaluated on four open-source GitHub
applications, demonstrated a precision rate of 0.87 in identifying personal
data flows. Additionally, we fact-checked the privacy statements of 15 Android
applications. This solution, designed to augment the efficiency of GDPR-related
privacy analysis tasks such as the Record of Processing Activities (ROPA), aims
to conserve resources, thereby saving time and enhancing productivity for code
reviewers.
Related papers
- An Empirical Study of Sensitive Information in Logs [12.980238412281471]
Presence of sensitive information in software logs poses significant privacy concerns.
This study offers a comprehensive analysis of privacy in software logs from multiple perspectives.
Our findings shed light on various perspectives of log privacy and reveal industry challenges.
arXiv Detail & Related papers (2024-09-17T16:12:23Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Step-Back Profiling: Distilling User History for Personalized Scientific Writing [50.481041470669766]
Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals.
We introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles.
Our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark.
arXiv Detail & Related papers (2024-06-20T12:58:26Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - Finding Privacy-relevant Source Code [0.0]
We introduce the concept of privacy-relevant methods - specific methods in code that are directly involved in the processing of personal data.
We then present an automated approach to assist in code review by identifying and categorizing these privacy-relevant methods in source code.
For our evaluation, we examined 100 open-source applications and found that our approach identifies fewer than 5% of the methods as privacy-relevant for personal data processing.
arXiv Detail & Related papers (2024-01-14T15:38:29Z) - FedDMF: Privacy-Preserving User Attribute Prediction using Deep Matrix
Factorization [1.9181612035055007]
We propose a novel algorithm for predicting user attributes without requiring user matching.
Our approach involves training deep matrix factorization models on different clients and sharing only attribute item vectors.
This allows us to predict user attributes without sharing the user vectors themselves.
arXiv Detail & Related papers (2023-12-24T06:49:00Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Scalable Discovery and Continuous Inventory of Personal Data at Rest in
Cloud Native Systems [0.0]
Cloud native systems are processing large amounts of personal data through numerous and possibly multi-paradigmatic data stores.
From a privacy engineering perspective, a core challenge is to keep track of all exact locations, where personal data is being stored.
We present Teiresias, comprising i) a workflow pattern for scalable discovery of personal data at rest, and ii) a cloud native system architecture and open source prototype implementation of said workflow pattern.
arXiv Detail & Related papers (2022-09-09T10:45:34Z) - Task-aware Privacy Preservation for Multi-dimensional Data [4.138783926370621]
Local differential privacy (LDP) is a state-of-the-art technique for privacy preservation.
In the future, LDP can be adopted to anonymize richer user data attributes.
We show how to significantly improve the ultimate task performance for multi-dimensional user data by considering a task-aware privacy preservation problem.
arXiv Detail & Related papers (2021-10-05T20:03:53Z) - Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss.
We experimentally evaluate our approach on queries over private databases.
We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z) - TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework
for Deep Learning with Anonymized Intermediate Representations [49.20701800683092]
We present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation.
The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
arXiv Detail & Related papers (2020-05-23T06:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.