Digital trace data collection through data donation
- URL: http://arxiv.org/abs/2011.09851v1
- Date: Fri, 13 Nov 2020 11:19:25 GMT
- Title: Digital trace data collection through data donation
- Authors: Laura Boeschoten and Jef Ausloos and Judith Moeller and Theo Araujo
and Daniel L. Oberski
- Abstract summary: Article 15 of the EU's General Data Protection Regulation: 2018 mandates individuals have electronic access to their personal data.
All major digital platforms now comply with law by users with "data download packages" (DDPs)
Through all data collected by public and private entities, citizens' digital life can be obtained and analyzed to answer social-scientific questions.
We provide a blueprint for digital trace data collection using DDPs, and devise a "total error framework" for such projects.
- Score: 0.4499833362998487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A potentially powerful method of social-scientific data collection and
investigation has been created by an unexpected institution: the law. Article
15 of the EU's 2018 General Data Protection Regulation (GDPR) mandates that
individuals have electronic access to a copy of their personal data, and all
major digital platforms now comply with this law by providing users with "data
download packages" (DDPs). Through voluntary donation of DDPs, all data
collected by public and private entities during the course of citizens' digital
life can be obtained and analyzed to answer social-scientific questions - with
consent. Thus, consented DDPs open the way for vast new research opportunities.
However, while this entirely new method of data collection will undoubtedly
gain popularity in the coming years, it also comes with its own questions of
representativeness and measurement quality, which are often evaluated
systematically by means of an error framework. Therefore, in this paper we
provide a blueprint for digital trace data collection using DDPs, and devise a
"total error framework" for such projects. Our error framework for digital
trace data collection through data donation is intended to facilitate high
quality social-scientific investigations using DDPs while critically reflecting
its unique methodological challenges and sources of error. In addition, we
provide a quality control checklist to guide researchers in leveraging the vast
opportunities afforded by this new mode of investigation.
Related papers
- How to Drill Into Silos: Creating a Free-to-Use Dataset of Data Subject Access Packages [0.0]
European Union's General Data Protection Regulation strengthened data subjects' right to access personal data.
Subjects' possibilities for actually using controller-provided subject access request packages (SARPs) are severely limited so far.
This dataset is publicly provided and shall, in the future, serve as a starting point for researching and comparing novel approaches for practically viable use of SARPs.
arXiv Detail & Related papers (2024-07-05T12:39:51Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources [5.898893619901382]
We propose a framework for the collaborative and private generation of synthetic data from distributed data holders.
We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP)
We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
arXiv Detail & Related papers (2024-02-13T17:26:32Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Yes-Yes-Yes: Donation-based Peer Reviewing Data Collection for ACL
Rolling Review and Beyond [58.71736531356398]
We present an in-depth discussion of peer reviewing data, outline the ethical and legal desiderata for peer reviewing data collection, and propose the first continuous, donation-based data collection workflow.
We report on the ongoing implementation of this workflow at the ACL Rolling Review and deliver the first insights obtained with the newly collected data.
arXiv Detail & Related papers (2022-01-27T11:02:43Z) - Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets
for Public Use [0.4462475518267084]
CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records.
Data elements were included based on the usefulness, public request, and privacy implications.
Specific field values were suppressed to reduce risk of reidentification and exposure of confidential information.
arXiv Detail & Related papers (2021-01-13T14:24:20Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z) - PrivGen: Preserving Privacy of Sequences Through Data Generation [14.579475552088688]
Sequential data can serve as a basis for research that will lead to improved processes.
Access and use of such data is usually limited or not permitted at all due to concerns about violating user privacy.
We propose PrivGen, an innovative method for generating data that maintains patterns and characteristics of the source data.
arXiv Detail & Related papers (2020-02-23T05:43:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.