More Data Types More Problems: A Temporal Analysis of Complexity,
Stability, and Sensitivity in Privacy Policies
- URL: http://arxiv.org/abs/2302.08936v1
- Date: Fri, 17 Feb 2023 15:21:24 GMT
- Title: More Data Types More Problems: A Temporal Analysis of Complexity,
Stability, and Sensitivity in Privacy Policies
- Authors: Juniper Lovato, Philip Mueller, Parisa Suchdev, Peter S. Dodds
- Abstract summary: Data brokers and data processors are part of a multi-billion-dollar industry that profits from collecting, buying, and selling consumer data.
Yet there is little transparency in the data collection industry which makes it difficult to understand what types of data are being collected, used, and sold.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collecting personally identifiable information (PII) on data subjects has
become big business. Data brokers and data processors are part of a
multi-billion-dollar industry that profits from collecting, buying, and selling
consumer data. Yet there is little transparency in the data collection industry
which makes it difficult to understand what types of data are being collected,
used, and sold, and thus the risk to individual data subjects. In this study,
we examine a large textual dataset of privacy policies from 1997-2019 in order
to investigate the data collection activities of data brokers and data
processors. We also develop an original lexicon of PII-related terms
representing PII data types curated from legislative texts. This mesoscale
analysis looks at privacy policies overtime on the word, topic, and network
levels to understand the stability, complexity, and sensitivity of privacy
policies over time. We find that (1) privacy legislation correlates with
changes in stability and turbulence of PII data types in privacy policies; (2)
the complexity of privacy policies decreases over time and becomes more
regularized; (3) sensitivity rises over time and shows spikes that are
correlated with events when new privacy legislation is introduced.
Related papers
- PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Security and Privacy on Generative Data in AIGC: A Survey [17.456578314457612]
We review the security and privacy on generative data in AIGC.
We reveal the successful experiences of state-of-the-art countermeasures in terms of the foundational properties of privacy, controllability, authenticity, and compliance.
arXiv Detail & Related papers (2023-09-18T02:35:24Z) - Visualising Personal Data Flows: Insights from a Case Study of Booking.com [8.485751288361616]
This paper reports our work on taking Booking.com as a case study to visualise personal data flows extracted from their privacy policy.
By showcasing how the company shares its consumers' personal data, we raise questions and extend discussions on the challenges and limitations of using privacy policies to inform online users about the true scale and the landscape of personal data flows.
arXiv Detail & Related papers (2023-04-19T12:17:46Z) - Certified Data Removal in Sum-Product Networks [78.27542864367821]
Deleting the collected data is often insufficient to guarantee data privacy.
UnlearnSPN is an algorithm that removes the influence of single data points from a trained sum-product network.
arXiv Detail & Related papers (2022-10-04T08:22:37Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Decision Making with Differential Privacy under a Fairness Lens [65.16089054531395]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes.
To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data.
This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z) - Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets
for Public Use [0.4462475518267084]
CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records.
Data elements were included based on the usefulness, public request, and privacy implications.
Specific field values were suppressed to reduce risk of reidentification and exposure of confidential information.
arXiv Detail & Related papers (2021-01-13T14:24:20Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z) - Privacy Policies over Time: Curation and Analysis of a Million-Document
Dataset [6.060757543617328]
We develop a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine.
We curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites.
Our data indicate that self-regulation for first-party websites has stagnated, while self-regulation for third parties has increased but is dominated by online advertising trade associations.
arXiv Detail & Related papers (2020-08-20T19:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.