Related papers: Providing Assurance and Scrutability on Shared Data and Machine Learning Models with Verifiable Credentials

Providing Assurance and Scrutability on Shared Data and Machine Learning Models with Verifiable Credentials

URL: http://arxiv.org/abs/2105.06370v1
Date: Thu, 13 May 2021 15:58:05 GMT
Title: Providing Assurance and Scrutability on Shared Data and Machine Learning Models with Verifiable Credentials
Authors: Iain Barclay, Alun Preece, Ian Taylor, Swapna K. Radha, Jarek Nabrzyski
Abstract summary: Practitioners rely on AI developers to have used relevant, trustworthy data. Scientists can issue signed credentials attesting to qualities of their data resources. The BOM provides a traceable record of the supply chain for an AI system.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adopting shared data resources requires scientists to place trust in the originators of the data. When shared data is later used in the development of artificial intelligence (AI) systems or machine learning (ML) models, the trust lineage extends to the users of the system, typically practitioners in fields such as healthcare and finance. Practitioners rely on AI developers to have used relevant, trustworthy data, but may have limited insight and recourse. This paper introduces a software architecture and implementation of a system based on design patterns from the field of self-sovereign identity. Scientists can issue signed credentials attesting to qualities of their data resources. Data contributions to ML models are recorded in a bill of materials (BOM), which is stored with the model as a verifiable credential. The BOM provides a traceable record of the supply chain for an AI system, which facilitates on-going scrutiny of the qualities of the contributing components. The verified BOM, and its linkage to certified data qualities, is used in the AI Scrutineer, a web-based tool designed to offer practitioners insight into ML model constituents and highlight any problems with adopted datasets, should they be found to have biased data or be otherwise discredited.

Related papers

From Machine Learning to Machine Unlearning: Complying with GDPR's Right to be Forgotten while Maintaining Business Value of Predictive Models [9.380866972744633]
This work develops a holistic machine learning-to-unlearning framework, called Ensemble-based iTerative Information Distillation (ETID) ETID incorporates a new ensemble learning method to build an accurate predictive model that can facilitate handling data erasure requests. We also introduce an innovative distillation-based unlearning method tailored to the constructed ensemble model to enable efficient and effective data erasure.
arXiv Detail & Related papers (2024-11-26T05:42:46Z)
AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI [0.8553254686016967]
"Garbage in Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI) There are no standard methods or frameworks for assessing the "readiness" of data for AI. AIDRIN is a framework covering a broad range of readiness dimensions available in the literature.
arXiv Detail & Related papers (2024-06-27T15:26:39Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Model-Driven Engineering Method to Support the Formalization of Machine Learning using SysML [0.0]
This work introduces a method supporting the collaborative definition of machine learning tasks by leveraging model-based engineering. The method supports the identification and integration of various data sources, the required definition of semantic connections between data attributes, and the definition of data processing steps.
arXiv Detail & Related papers (2023-07-10T11:33:46Z)
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z)
Privacy-Preserving Machine Learning for Collaborative Data Sharing via Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z)
A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems. ML models often remember' the old data. Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
A framework for fostering transparency in shared artificial intelligence models by increasing visibility of contributions [0.6850683267295249]
This paper presents a novel method for deriving a quantifiable metric capable of ranking the overall transparency of the process pipelines used to generate AI systems. The methodology for calculating the metric, and the type of criteria that could be used to make judgements on the visibility of contributions to systems are evaluated.
arXiv Detail & Related papers (2021-03-05T11:28:50Z)
Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models. Students are trained on the output of their teachers via synthetically generated input data. The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
A Distributed Trust Framework for Privacy-Preserving Machine Learning [4.282091426377838]
This paper outlines a distributed infrastructure which is used to facilitate peer-to-peer trust between distributed agents. We detail a proof of concept using Hyperledger Aries, Decentralised Identifiers (DIDs) and Verifiable Credentials (VCs)
arXiv Detail & Related papers (2020-06-03T18:06:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.