Related papers: Helping Code Reviewer Prioritize: Pinpointing Personal Data and its Processing

Helping Code Reviewer Prioritize: Pinpointing Personal Data and its Processing

URL: http://arxiv.org/abs/2306.11495v1
Date: Tue, 20 Jun 2023 12:30:46 GMT
Title: Helping Code Reviewer Prioritize: Pinpointing Personal Data and its Processing
Authors: Feiyang Tang, Bjarte M. {\O}stvold, Magiel Bruntink
Abstract summary: We have designed two specialized views to help code reviewers in prioritizing their work related to personal data. Our approach, evaluated on four open-source GitHub applications, demonstrated a precision rate of 0.87 in identifying personal data flows. This solution, designed to augment the efficiency of privacy-related analysis tasks such as the Record of Processing Activities (ROPA), aims to conserve resources, thereby saving time and enhancing productivity for code reviewers.
Score: 0.9238700679836852
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Ensuring compliance with the General Data Protection Regulation (GDPR) is a crucial aspect of software development. This task, due to its time-consuming nature and requirement for specialized knowledge, is often deferred or delegated to specialized code reviewers. These reviewers, particularly when external to the development organization, may lack detailed knowledge of the software under review, necessitating the prioritization of their resources. To address this, we have designed two specialized views of a codebase to help code reviewers in prioritizing their work related to personal data: one view displays the types of personal data representation, while the other provides an abstract depiction of personal data processing, complemented by an optional detailed exploration of specific code snippets. Leveraging static analysis, our method identifies personal data-related code segments, thereby expediting the review process. Our approach, evaluated on four open-source GitHub applications, demonstrated a precision rate of 0.87 in identifying personal data flows. Additionally, we fact-checked the privacy statements of 15 Android applications. This solution, designed to augment the efficiency of GDPR-related privacy analysis tasks such as the Record of Processing Activities (ROPA), aims to conserve resources, thereby saving time and enhancing productivity for code reviewers.

Related papers

PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data [76.21047984886273]
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. Due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users. We introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities.
arXiv Detail & Related papers (2025-02-28T00:43:35Z)
An Empirical Study of Sensitive Information in Logs [12.980238412281471]
Presence of sensitive information in software logs poses significant privacy concerns. This study offers a comprehensive analysis of privacy in software logs from multiple perspectives. Our findings shed light on various perspectives of log privacy and reveal industry challenges.
arXiv Detail & Related papers (2024-09-17T16:12:23Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
Step-Back Profiling: Distilling User History for Personalized Scientific Writing [50.481041470669766]
Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals. We introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles. Our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark.
arXiv Detail & Related papers (2024-06-20T12:58:26Z)
Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z)
Finding Privacy-relevant Source Code [0.0]
We introduce the concept of privacy-relevant methods - specific methods in code that are directly involved in the processing of personal data. We then present an automated approach to assist in code review by identifying and categorizing these privacy-relevant methods in source code. For our evaluation, we examined 100 open-source applications and found that our approach identifies fewer than 5% of the methods as privacy-relevant for personal data processing.
arXiv Detail & Related papers (2024-01-14T15:38:29Z)
FedDMF: Privacy-Preserving User Attribute Prediction using Deep Matrix Factorization [1.9181612035055007]
We propose a novel algorithm for predicting user attributes without requiring user matching. Our approach involves training deep matrix factorization models on different clients and sharing only attribute item vectors. This allows us to predict user attributes without sharing the user vectors themselves.
arXiv Detail & Related papers (2023-12-24T06:49:00Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Scalable Discovery and Continuous Inventory of Personal Data at Rest in Cloud Native Systems [0.0]
Cloud native systems are processing large amounts of personal data through numerous and possibly multi-paradigmatic data stores. From a privacy engineering perspective, a core challenge is to keep track of all exact locations, where personal data is being stored. We present Teiresias, comprising i) a workflow pattern for scalable discovery of personal data at rest, and ii) a cloud native system architecture and open source prototype implementation of said workflow pattern.
arXiv Detail & Related papers (2022-09-09T10:45:34Z)
Task-aware Privacy Preservation for Multi-dimensional Data [4.138783926370621]
Local differential privacy (LDP) is a state-of-the-art technique for privacy preservation. In the future, LDP can be adopted to anonymize richer user data attributes. We show how to significantly improve the ultimate task performance for multi-dimensional user data by considering a task-aware privacy preservation problem.
arXiv Detail & Related papers (2021-10-05T20:03:53Z)
Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss. We experimentally evaluate our approach on queries over private databases. We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z)
TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework for Deep Learning with Anonymized Intermediate Representations [49.20701800683092]
We present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
arXiv Detail & Related papers (2020-05-23T06:21:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.