Related papers: Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration

Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration

URL: http://arxiv.org/abs/2212.03373v1
Date: Tue, 6 Dec 2022 23:53:41 GMT
Title: Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration
Authors: Anna Bogdanova, Akira Imakura, Tetsuya Sakurai, Tomoya Fujii, Teppei Sakamoto, Hiroyuki Abe
Abstract summary: A parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data. This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm.
Score: 5.994347858883343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used to explain the predictions of black-box machine learning models to customers and developers. However, a parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data. Such models, trained over horizontally or vertically partitioned data, present a challenge for explainable AI because the explaining party may have a biased view of background data or a partial view of the feature space. As a result, explanations obtained from different participants of distributed machine learning might not be consistent with one another, undermining trust in the product. This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm (KernelSHAP) and Data Collaboration method of privacy-preserving distributed machine learning. In particular, we present three algorithms for different scenarios of explainability in Data Collaboration and verify their consistency with experiments on open-access datasets. Our results demonstrated a significant (by at least a factor of 1.75) decrease in feature attribution discrepancies among the users of distributed machine learning.

Related papers

Resilient Peer-to-peer Learning based on Adaptive Aggregation [0.5530212768657544]
Collaborative learning in peer-to-peer networks offers the benefits of learning while mitigating single points of failure. adversarial workers pose potential threats by attempting to inject malicious information into the network. This paper introduces a resilient aggregation technique aimed at fostering similarity learning processes.
arXiv Detail & Related papers (2025-01-08T16:47:45Z)
Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest. Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z)
Robust Computer Vision in an Ever-Changing World: A Survey of Techniques for Tackling Distribution Shifts [20.17397328893533]
AI applications are becoming increasingly visible to the general public. There is a notable gap between the theoretical assumptions researchers make about computer vision models and the reality those models face when deployed in the real world. One of the critical reasons for this gap is a challenging problem known as distribution shift.
arXiv Detail & Related papers (2023-12-03T23:40:12Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
Striving for data-model efficiency: Identifying data externalities on group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population. Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z)
Privacy-Preserving Machine Learning for Collaborative Data Sharing via Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z)
Partitioned Variational Inference: A Framework for Probabilistic Federated Learning [45.9225420256808]
We introduce partitioned variational inference (PVI), a framework for performing VI in the federated setting. We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners.
arXiv Detail & Related papers (2022-02-24T18:15:30Z)
Non-IID data and Continual Learning processes in Federated Learning: A long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it. At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z)
A survey on datasets for fairness-aware machine learning [6.962333053044713]
A large variety of fairness-aware machine learning solutions have been proposed. In this paper, we overview real-world datasets used for fairness-aware machine learning. For a deeper understanding of bias and fairness in the datasets, we investigate the interesting relationships using exploratory analysis.
arXiv Detail & Related papers (2021-10-01T16:54:04Z)
Federated Learning System without Model Sharing through Integration of Dimensional Reduced Data Representations [6.9485501711137525]
We explore an alternative federated learning system that enables integration of dimensionality reduced representations of distributed data prior to a supervised learning task. We compare the performance of this approach on image classification tasks to three alternative frameworks: centralized machine learning, individual machine learning, and Federated Averaging. Our results show that our approach can achieve similar accuracy as Federated Averaging and performs better than Federated Averaging in a small-user setting.
arXiv Detail & Related papers (2020-11-13T08:12:00Z)
Accelerating Federated Learning in Heterogeneous Data and Computational Environments [0.7106986689736825]
We introduce a novel distributed validation weighting scheme (DVW), which evaluates the performance of a learner in the federation against a distributed validation set. We empirically show that DVW results in better performance compared to established methods, such as FedAvg.
arXiv Detail & Related papers (2020-08-25T21:28:38Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.