Achieving Transparency in Distributed Machine Learning with Explainable
Data Collaboration
- URL: http://arxiv.org/abs/2212.03373v1
- Date: Tue, 6 Dec 2022 23:53:41 GMT
- Title: Achieving Transparency in Distributed Machine Learning with Explainable
Data Collaboration
- Authors: Anna Bogdanova, Akira Imakura, Tetsuya Sakurai, Tomoya Fujii, Teppei
Sakamoto, Hiroyuki Abe
- Abstract summary: A parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data.
This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm.
- Score: 5.994347858883343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transparency of Machine Learning models used for decision support in various
industries becomes essential for ensuring their ethical use. To that end,
feature attribution methods such as SHAP (SHapley Additive exPlanations) are
widely used to explain the predictions of black-box machine learning models to
customers and developers. However, a parallel trend has been to train machine
learning models in collaboration with other data holders without accessing
their data. Such models, trained over horizontally or vertically partitioned
data, present a challenge for explainable AI because the explaining party may
have a biased view of background data or a partial view of the feature space.
As a result, explanations obtained from different participants of distributed
machine learning might not be consistent with one another, undermining trust in
the product. This paper presents an Explainable Data Collaboration Framework
based on a model-agnostic additive feature attribution algorithm (KernelSHAP)
and Data Collaboration method of privacy-preserving distributed machine
learning. In particular, we present three algorithms for different scenarios of
explainability in Data Collaboration and verify their consistency with
experiments on open-access datasets. Our results demonstrated a significant (by
at least a factor of 1.75) decrease in feature attribution discrepancies among
the users of distributed machine learning.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Robust Computer Vision in an Ever-Changing World: A Survey of Techniques
for Tackling Distribution Shifts [20.17397328893533]
AI applications are becoming increasingly visible to the general public.
There is a notable gap between the theoretical assumptions researchers make about computer vision models and the reality those models face when deployed in the real world.
One of the critical reasons for this gap is a challenging problem known as distribution shift.
arXiv Detail & Related papers (2023-12-03T23:40:12Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - Striving for data-model efficiency: Identifying data externalities on
group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance.
We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population.
Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - Partitioned Variational Inference: A Framework for Probabilistic
Federated Learning [45.9225420256808]
We introduce partitioned variational inference (PVI), a framework for performing VI in the federated setting.
We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners.
arXiv Detail & Related papers (2022-02-24T18:15:30Z) - Non-IID data and Continual Learning processes in Federated Learning: A
long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private.
In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it.
At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z) - A survey on datasets for fairness-aware machine learning [6.962333053044713]
A large variety of fairness-aware machine learning solutions have been proposed.
In this paper, we overview real-world datasets used for fairness-aware machine learning.
For a deeper understanding of bias and fairness in the datasets, we investigate the interesting relationships using exploratory analysis.
arXiv Detail & Related papers (2021-10-01T16:54:04Z) - Federated Learning System without Model Sharing through Integration of
Dimensional Reduced Data Representations [6.9485501711137525]
We explore an alternative federated learning system that enables integration of dimensionality reduced representations of distributed data prior to a supervised learning task.
We compare the performance of this approach on image classification tasks to three alternative frameworks: centralized machine learning, individual machine learning, and Federated Averaging.
Our results show that our approach can achieve similar accuracy as Federated Averaging and performs better than Federated Averaging in a small-user setting.
arXiv Detail & Related papers (2020-11-13T08:12:00Z) - Accelerating Federated Learning in Heterogeneous Data and Computational
Environments [0.7106986689736825]
We introduce a novel distributed validation weighting scheme (DVW), which evaluates the performance of a learner in the federation against a distributed validation set.
We empirically show that DVW results in better performance compared to established methods, such as FedAvg.
arXiv Detail & Related papers (2020-08-25T21:28:38Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.