Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules
for Secure and Confidential Data Synthesis
- URL: http://arxiv.org/abs/2311.15460v1
- Date: Mon, 27 Nov 2023 00:12:47 GMT
- Title: Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules
for Secure and Confidential Data Synthesis
- Authors: Anantaa Kotal, Lavanya Elluri, Deepti Gupta, Varun Mandalapu and
Anupam Joshi
- Abstract summary: The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys.
There is significant concern regarding the security of this data as well as the privacy of the participants.
Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing.
We propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Big Data empowers the farming community with the information needed to
optimize resource usage, increase productivity, and enhance the sustainability
of agricultural practices. The use of Big Data in farming requires the
collection and analysis of data from various sources such as sensors,
satellites, and farmer surveys. While Big Data can provide the farming
community with valuable insights and improve efficiency, there is significant
concern regarding the security of this data as well as the privacy of the
participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct
on agricultural data sharing by contractual agreement, and the proposed EU AI
law, have been created to address the issue of data privacy and provide
specific guidelines on when and how data can be shared between organizations.
To make confidential agricultural data widely available for Big Data analysis
without violating the privacy of the data subjects, we consider
privacy-preserving methods of data sharing in agriculture. Deep learning-based
synthetic data generation has been proposed for privacy-preserving data
sharing. However, there is a lack of compliance with documented data privacy
policies in such privacy-preserving efforts. In this study, we propose a novel
framework for enforcing privacy policy rules in privacy-preserving data
generation algorithms. We explore several available agricultural codes of
conduct, extract knowledge related to the privacy constraints in data, and use
the extracted knowledge to define privacy bounds in a privacy-preserving
generative model. We use our framework to generate synthetic agricultural data
and present experimental results that demonstrate the utility of the synthetic
dataset in downstream tasks. We also show that our framework can evade
potential threats and secure data based on applicable regulatory policy rules.
Related papers
- Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains [9.123834467375532]
We explore the feasibility of using synthetic data generated from differentially private language models in place of real data to facilitate the development of NLP in high-stakes domains.
Our results show that prior simplistic evaluations have failed to highlight utility, privacy, and fairness issues in the synthetic data.
arXiv Detail & Related papers (2024-10-10T19:31:02Z) - Privacy-Preserving Data Linkage Across Private and Public Datasets for Collaborative Agriculture Research [1.6000462052866455]
Digital agriculture raises privacy concerns such as adverse pricing, price discrimination, higher insurance costs, and manipulation of resources.
This study introduces a privacy-preserving framework that addresses these risks while allowing secure data sharing for digital agriculture.
Our framework enables comprehensive data analysis while protecting privacy.
arXiv Detail & Related papers (2024-09-09T21:07:13Z) - Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision [2.7968600664591983]
This paper presents a privacy-preserving framework for genomic research, developed in collaboration with Lynx.MD.
The framework addresses critical cybersecurity and privacy challenges, enabling the privacy-preserving sharing and analysis of genomic data.
Implementing the framework within Lynx.MD involves encoding genomic data into binary formats and applying noise through controlled perturbation techniques.
arXiv Detail & Related papers (2024-07-12T05:43:13Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Synthetic Data: Methods, Use Cases, and Risks [11.413309528464632]
A possible alternative gaining momentum in both the research community and industry is to share synthetic data instead.
We provide a gentle introduction to synthetic data and discuss its use cases, the privacy challenges that are still unaddressed, and its inherent limitations as an effective privacy-enhancing technology.
arXiv Detail & Related papers (2023-03-01T16:35:33Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets
for Public Use [0.4462475518267084]
CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records.
Data elements were included based on the usefulness, public request, and privacy implications.
Specific field values were suppressed to reduce risk of reidentification and exposure of confidential information.
arXiv Detail & Related papers (2021-01-13T14:24:20Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z) - Beyond privacy regulations: an ethical approach to data usage in
transportation [64.86110095869176]
We describe how Federated Machine Learning can be applied to the transportation sector.
We see Federated Learning as a method that enables us to process privacy-sensitive data, while respecting customer's privacy.
arXiv Detail & Related papers (2020-04-01T15:10:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.