Related papers: Honest Computing: Achieving demonstrable data lineage and provenance for driving data and process-sensitive policies

Honest Computing: Achieving demonstrable data lineage and provenance for driving data and process-sensitive policies

URL: http://arxiv.org/abs/2407.14390v1
Date: Fri, 19 Jul 2024 15:13:42 GMT
Title: Honest Computing: Achieving demonstrable data lineage and provenance for driving data and process-sensitive policies
Authors: Florian Guitton, Axel Oehmichen, Étienne Bossé, Yike Guo,
Abstract summary: Data is susceptible to undue disclosures, leaks, losses, manipulation, or fabrication. We introduce the concept of Honest Computing as the practice and approach that emphasizes transparency, integrity, and ethical behaviour. This foundational layer approach can help define new standards for appropriate data custody and processing.
Score: 8.67097489372345
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Data is the foundation of any scientific, industrial or commercial process. Its journey typically flows from collection to transport, storage, management and processing. While best practices and regulations guide data management and protection, recent events have underscored its vulnerability. Academic research and commercial data handling have been marred by scandals, revealing the brittleness of data management. Data, despite its importance, is susceptible to undue disclosures, leaks, losses, manipulation, or fabrication. These incidents often occur without visibility or accountability, necessitating a systematic structure for safe, honest, and auditable data management. In this paper, we introduce the concept of Honest Computing as the practice and approach that emphasizes transparency, integrity, and ethical behaviour within the realm of computing and technology. It ensures that computer systems and software operate honestly and reliably without hidden agendas, biases, or unethical practices. It enables privacy and confidentiality of data and code by design and by default. We also introduce a reference framework to achieve demonstrable data lineage and provenance, contrasting it with Secure Computing, a related but differently-orientated form of computing. At its core, Honest Computing leverages Trustless Computing, Confidential Computing, Distributed Computing, Cryptography and AAA security concepts. Honest Computing opens new ways of creating technology-based processes and workflows which permit the migration of regulatory frameworks for data protection from principle-based approaches to rule-based ones. Addressing use cases in many fields, from AI model protection and ethical layering to digital currency formation for finance and banking, trading, and healthcare, this foundational layer approach can help define new standards for appropriate data custody and processing.

Related papers

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [59.66984417026933]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
Rethinking Data Protection in the (Generative) Artificial Intelligence Era [115.71019708491386]
We propose a four-level taxonomy that captures the diverse protection needs arising in modern (generative) AI models and systems.<n>Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline.
arXiv Detail & Related papers (2025-07-03T02:45:51Z)
Secure Computation and Trustless Data Intermediaries in Data Spaces [0.44998333629984877]
This paper explores the integration of advanced cryptographic techniques for secure computation in data spaces. We exploit the introduced secure methods, i.e. Secure Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE) We present solutions through real-world use cases, including air traffic management, manufacturing, and secondary data use.
arXiv Detail & Related papers (2024-10-21T19:10:53Z)
Human-Data Interaction Framework: A Comprehensive Model for a Future Driven by Data and Humans [0.0]
The Human-Data Interaction (HDI) framework has become an essential approach to tackling the challenges and ethical issues associated with data governance and utilization in the modern digital world. This paper outlines the fundamental steps required for organizations to seamlessly integrate HDI principles.
arXiv Detail & Related papers (2024-07-30T17:57:09Z)
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z)
Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data. FL and related techniques are often described as privacy-preserving. We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z)
AI Assurance using Causal Inference: Application to Public Policy [0.0]
Most AI approaches can only be represented as "black boxes" and suffer from the lack of transparency. It is crucial not only to develop effective and robust AI systems, but to make sure their internal processes are explainable and fair.
arXiv Detail & Related papers (2021-12-01T16:03:06Z)
Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process. We generate a representative as well as fair version of the UCI Adult census data set. We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z)
Trustworthy Transparency by Design [57.67333075002697]
We propose a transparency framework for software design, incorporating research on user trust and experience. Our framework enables developing software that incorporates transparency in its design.
arXiv Detail & Related papers (2021-03-19T12:34:01Z)
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z)
Towards Compliant Data Management Systems for Healthcare ML [6.057289837472806]
We review how data flows within machine learning projects in healthcare from source to storage to use in training algorithms and beyond. Our objective is to design tools to detect and track sensitive data across machines and users across the life cycle of a project. We build a prototype of the solution that demonstrates the difficulties in this domain.
arXiv Detail & Related papers (2020-11-15T15:27:51Z)
Privacy Preservation in Federated Learning: An insightful survey from the GDPR Perspective [10.901568085406753]
Article is dedicated to surveying on the state-of-the-art privacy techniques, which can be employed in Federated learning. Recent research has demonstrated that retaining data and on computation in FL is not enough for privacy-guarantee. This is because ML model parameters exchanged between parties in an FL system, which can be exploited in some privacy attacks.
arXiv Detail & Related papers (2020-11-10T21:41:25Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.