Analyzing developer discussions on EU and US privacy legislation compliance in GitHub repositories
- URL: http://arxiv.org/abs/2512.10618v1
- Date: Thu, 11 Dec 2025 13:16:20 GMT
- Title: Analyzing developer discussions on EU and US privacy legislation compliance in GitHub repositories
- Authors: Georgia M. Kapitsaki, Maria Papoutsoglou, Christoph Treude, Ioanna Theophilou,
- Abstract summary: EU General Data Protection Regulation (EU General Data Protection Regulation) and the California Consumer Privacy Act (CCPA) have forced the community to focus on users' data privacy.<n>Despite the vast amount of developer issues available in GitHub repositories there is a lack of empirical evidence on issues developers of Open Source Software comply with privacy legislation.<n>We devised 24 discussion categories placed in six clusters: features/bugs, consent-related, documentation, data/sharing, storing, and general compliance.
- Score: 12.041470749136488
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Context: Privacy legislation has impacted the way software systems are developed, prompting practitioners to update their implementations. Specifically, the EU General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) have forced the community to focus on users' data privacy. Despite the vast amount of data on developer issues available in GitHub repositories, there is a lack of empirical evidence on the issues developers of Open Source Software discuss to comply with privacy legislation. Method: In this work, we examine such discussions by mining and analyzing 32,820 issues from GitHub repositories. We partially analyzed the dataset automatically to identify law user rights and principles indicated, and manually analyzed a sample of 1,186 issues based on the type of concern addressed. Results: We devised 24 discussion categories placed in six clusters: features/bugs, consent-related, documentation, data storing/sharing, adaptability, and general compliance. Our results show that developers mainly focus on specific user rights from the legislation (right to erasure, right to opt-out, right to access), addressing other rights less frequently, while most discussions concern user consent, user rights functionality, bugs and cookies management. Conclusion: The created taxonomy can help practitioners understand which issues are discussed for law compliance, so that they ensure they address them first in their systems. In addition, the educational community can reshape curricula to better educate future engineers on the privacy law concerns raised, and the research community can identify gaps and areas for improvement to support and accelerate data privacy law compliance.
Related papers
- Examining Software Developers' Needs for Privacy Enforcing Techniques: A survey [2.879036956042182]
Data privacy legislation has rendered data privacy law compliance a requirement of all software systems.<n>As data compliance is tightly coupled with legal knowledge, it is not always easy to perform such integrations in software systems.<n>Emerging developer needs that can assist in privacy law compliance have not been examined.
arXiv Detail & Related papers (2025-12-15T13:20:14Z) - "I need to learn better searching tactics for privacy policy laws.'' Investigating Software Developers' Behavior When Using Sources on Privacy Issues [8.662963983664223]
Our study highlights major shortcomings in existing support for privacy-related development tasks.<n>Based on our findings, we discuss the need for more accessible, understandable, and actionable privacy resources for developers.
arXiv Detail & Related papers (2025-11-11T09:58:06Z) - Evolution of repositories and privacy laws: commit activities in the GDPR and CCPA era [2.1331883629523634]
We analyzed 37,213 commits from 12,391 repositories since 2016, whereas 594 commits from 70 most popular repositories dataset were manually analyzed.<n>We observe most commits were performed on the year the law came into effect and privacy relevant terms appear in the commit messages.<n>The study showed that more educational activities on privacy user rights are needed, as well as tools for privacy recommendations.
arXiv Detail & Related papers (2025-05-28T11:10:58Z) - Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning [53.92712851223158]
We formulate safety and privacy issues into contextualized compliance problems following the Contextual Integrity (CI) theory.<n>Under the CI framework, we align our model with three critical regulatory standards: EU AI Act, and HIPAA.<n>We employ reinforcement learning (RL) with a rule-based reward to incentivize contextual reasoning capabilities while enhancing compliance with safety and privacy norms.
arXiv Detail & Related papers (2025-05-20T16:40:09Z) - Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Exploring User Privacy Awareness on GitHub: An Empirical Study [5.822284390235265]
GitHub provides developers with a practical way to distribute source code and collaborate on common projects.
To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication.
Despite the endless effort, the platform still faces various issues related to the privacy of its users.
arXiv Detail & Related papers (2024-09-06T06:41:46Z) - SoK: The Gap Between Data Rights Ideals and Reality [42.769107967436945]
Do rights-based privacy laws effectively empower individuals over their data?<n>This paper scrutinizes these approaches by reviewing empirical studies, news articles, and blog posts.
arXiv Detail & Related papers (2023-12-03T21:52:51Z) - The whos, whats, and whys of issues related to personal data and data protection in open-source projects on GitHub [6.733786687734259]
We use inductive coding to analyze 652 issues from Open Source GitHub projects.<n>We observed a significant increase in reporting when data protection regulations came into effect.<n>All in all our findings indicate data protection regulations effectively start discussions about privacy software development community.
arXiv Detail & Related papers (2023-04-13T09:42:03Z) - Having your Privacy Cake and Eating it Too: Platform-supported Auditing
of Social Media Algorithms for Public Interest [70.02478301291264]
Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse.
Prior studies have used black-box methods to show that these algorithms can lead to biased or discriminatory outcomes.
We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation.
arXiv Detail & Related papers (2022-07-18T17:32:35Z) - Pile of Law: Learning Responsible Data Filtering from the Law and a
256GB Open-Source Legal Dataset [46.156169284961045]
We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material.
First, we gather and make available the Pile of Law, a 256GB dataset of open-source English-language legal and administrative data.
Second, we distill the legal norms that governments have developed to constrain the inclusion of toxic or private content into actionable lessons.
Third, we show how the Pile of Law offers researchers the opportunity to learn such filtering rules directly from the data.
arXiv Detail & Related papers (2022-07-01T06:25:15Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.