Evolution of repositories and privacy laws: commit activities in the GDPR and CCPA era
- URL: http://arxiv.org/abs/2505.22234v1
- Date: Wed, 28 May 2025 11:10:58 GMT
- Title: Evolution of repositories and privacy laws: commit activities in the GDPR and CCPA era
- Authors: Georgia M. Kapitsaki, Maria Papoutsoglou,
- Abstract summary: We analyzed 37,213 commits from 12,391 repositories since 2016, whereas 594 commits from 70 most popular repositories dataset were manually analyzed.<n>We observe most commits were performed on the year the law came into effect and privacy relevant terms appear in the commit messages.<n>The study showed that more educational activities on privacy user rights are needed, as well as tools for privacy recommendations.
- Score: 2.1331883629523634
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Free and open source software has gained a lot of momentum in the industry and the research community. The latest advances in privacy legislation, including the EU General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), have forced the community to pay special attention to users' data privacy. The main aim of this work is to examine software repositories that are acting on privacy laws. We have collected commit data from GitHub repositories in order to understand indications on main data privacy laws (GDPR, CCPA, CPRA, UK DPA) in the last years. Via an automated process, we analyzed 37,213 commits from 12,391 repositories since 2016, whereas 594 commits from the 70 most popular repositories of the dataset were manually analyzed. We observe that most commits were performed on the year the law came into effect and privacy relevant terms appear in the commit messages, whereas reference to specific data privacy user rights is scarce. The study showed that more educational activities on data privacy user rights are needed, as well as tools for privacy recommendations, whereas verifying actual compliance via source code execution is a useful direction for software engineering researchers.
Related papers
- Differentially Private Synthetic Data Release for Topics API Outputs [63.79476766779742]
We focus on one Privacy-Preserving Ads API: the Topics API, part of Google Chrome's Privacy Sandbox.<n>We generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data.<n>We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset.
arXiv Detail & Related papers (2025-06-30T13:46:57Z) - MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z) - Interactive GDPR-Compliant Privacy Policy Generation for Software Applications [6.189770781546807]
To use software applications users are sometimes requested to provide their personal information.
As privacy has become a significant concern many protection regulations exist worldwide.
We propose an approach that generates comprehensive and compliant privacy policy.
arXiv Detail & Related papers (2024-10-04T01:22:16Z) - A Large-Scale Privacy Assessment of Android Third-Party SDKs [17.245330733308375]
Third-party Software Development Kits (SDKs) are widely adopted in Android app development.
This convenience raises substantial concerns about unauthorized access to users' privacy-sensitive information.
Our study offers a targeted analysis of user privacy protection among Android third-party SDKs.
arXiv Detail & Related papers (2024-09-16T15:44:43Z) - Exploring User Privacy Awareness on GitHub: An Empirical Study [5.822284390235265]
GitHub provides developers with a practical way to distribute source code and collaborate on common projects.
To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication.
Despite the endless effort, the platform still faces various issues related to the privacy of its users.
arXiv Detail & Related papers (2024-09-06T06:41:46Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.<n>We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.<n>State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR [9.676166100354282]
This study aims to address challenge of compliance analysis between privacy policies for 5G networks.
We manually collected privacy policies from almost 70 different MNOs and we utilized an automated BERT-based model for classification.
In addition, we present first empirical evidence on the readability of privacy policies for 5G network. we adopted incorporates various established readability metrics.
arXiv Detail & Related papers (2024-07-09T11:47:52Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Privacy Explanations - A Means to End-User Trust [64.7066037969487]
We looked into how explainability might help to tackle this problem.
We created privacy explanations that aim to help to clarify to end users why and for what purposes specific data is required.
Our findings reveal that privacy explanations can be an important step towards increasing trust in software systems.
arXiv Detail & Related papers (2022-10-18T09:30:37Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z) - Privacy Policies over Time: Curation and Analysis of a Million-Document
Dataset [6.060757543617328]
We develop a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine.
We curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites.
Our data indicate that self-regulation for first-party websites has stagnated, while self-regulation for third parties has increased but is dominated by online advertising trade associations.
arXiv Detail & Related papers (2020-08-20T19:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.