An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software
- URL: http://arxiv.org/abs/2406.14724v1
- Date: Thu, 20 Jun 2024 20:38:33 GMT
- Title: An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software
- Authors: Lucas Franke, Huayu Liang, Sahar Farzanehpour, Aaron Brantly, James C. Davis, Chris Brown,
- Abstract summary: European Union's General Data Protection Regulation require software developers to meet privacy requirements interacting with users' data.
Prior research describes impact of such laws on development, but only when commercial software.
- Score: 4.2610816955137
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Governments worldwide are considering data privacy regulations. These laws, e.g. the European Union's General Data Protection Regulation (GDPR), require software developers to meet privacy-related requirements when interacting with users' data. Prior research describes the impact of such laws on software development, but only for commercial software. Open-source software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance. We do not know how such laws impact open-source software development. Aims: To understand how data privacy laws affect open-source software development. We studied the European Union's GDPR, the most prominent such law. We investigated how GDPR compliance activities influence OSS developer activity (RQ1), how OSS developers perceive fulfilling GDPR requirements (RQ2), the most challenging GDPR requirements to implement (RQ3), and how OSS developers assess GDPR compliance (RQ4). Method: We distributed an online survey to explore perceptions of GDPR implementations from open-source developers (N=56). We further conducted a repository mining study to analyze development metrics on pull requests (N=31462) submitted to open-source GitHub repositories. Results: GDPR policies complicate open-source development processes and introduce challenges for developers, primarily regarding the management of users' data, implementation costs and time, and assessments of compliance. Moreover, we observed negative perceptions of GDPR from open-source developers and significant increases in development activity, in particular metrics related to coding and reviewing activity, on GitHub pull requests related to GDPR compliance. Conclusions: Our findings motivate policy-related resources and automated tools to support data privacy regulation implementation and compliance efforts in open-source software.
Related papers
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Towards an Enforceable GDPR Specification [49.1574468325115]
Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's.
One emerging technique to realize PbD is enforcement (RE)
We present a set of requirements and an iterative methodology for creating formal specifications of legal provisions.
arXiv Detail & Related papers (2024-02-27T09:38:51Z) - A First Look at the General Data Protection Regulation (GDPR) in
Open-Source Software [4.844017045823075]
This poster describes work on regulated data protection in opensource software.
We surveyed open-source developers to understand their experiences.
We call for improved policy-related compliance resources.
arXiv Detail & Related papers (2024-01-26T03:49:13Z) - A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs [3.1002416427168304]
General Data Protection Regulation (DPA) requires a data processing agreement (DPA) which regulates processing and ensures personal data remains protected.
Checking completeness of DPA according to prerequisite provisions is therefore an essential to ensure that requirements are complete.
We propose an automation strategy to address the completeness checking of DPAs against stipulated provisions.
arXiv Detail & Related papers (2023-11-23T10:05:52Z) - Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z) - NL2GDPR: Automatically Develop GDPR Compliant Android Application
Features from Natural Language [28.51179772165298]
NL2 is an information extraction tool developed by Baidu Cognitive Computing Lab.
It generates privacycentric information and generating privacy policies.
It can achieve 92.9% identification of policies related to personal storage process, data process, and types respectively.
arXiv Detail & Related papers (2022-08-29T04:16:50Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - DriverGym: Democratising Reinforcement Learning for Autonomous Driving [75.91049219123899]
We propose DriverGym, an open-source environment for developing reinforcement learning algorithms for autonomous driving.
DriverGym provides access to more than 1000 hours of expert logged data and also supports reactive and data-driven agent behavior.
The performance of an RL policy can be easily validated on real-world data using our extensive and flexible closed-loop evaluation protocol.
arXiv Detail & Related papers (2021-11-12T11:47:08Z) - Compliance Generation for Privacy Documents under GDPR: A Roadmap for
Implementing Automation and Machine Learning [2.1485350418225244]
Privatech project focuses on corporations and law firms as agents of compliance.
Data processors must implement accountability measures to assess and document compliance.
We provide a roadmap for compliance assessment and generation by identifying compliance issues.
arXiv Detail & Related papers (2020-12-23T14:46:51Z) - Why are Developers Struggling to Put GDPR into Practice when Developing
Privacy-Preserving Software Systems? [3.04585143845864]
General Data Protection Law provides guidelines for developers on how to protect user data.
Previous research has attempted to investigate what hinders developers from embedding privacy into software systems.
This paper investigates the issues that hinder software developers from implementing software applications taking law on-board.
arXiv Detail & Related papers (2020-08-07T04:34:08Z) - GDPR: When the Right to Access Personal Data Becomes a Threat [63.732639864601914]
We examine more than 300 data controllers performing for each of them a request to access personal data.
We find that 50.4% of the data controllers that handled the request, have flaws in the procedure of identifying the users.
With the undesired and surprising result that, in its present deployment, has actually decreased the privacy of the users of web services.
arXiv Detail & Related papers (2020-05-04T22:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.