Non-readily identifiable data collaboration analysis for multiple
datasets including personal information
- URL: http://arxiv.org/abs/2208.14611v1
- Date: Wed, 31 Aug 2022 03:19:17 GMT
- Title: Non-readily identifiable data collaboration analysis for multiple
datasets including personal information
- Authors: Akira Imakura, Tetsuya Sakurai, Yukihiko Okada, Tomoya Fujii, Teppei
Sakamoto, Hiroyuki Abe
- Abstract summary: Data confidentiality and cross-institutional communication are critical for medical datasets.
In this study, the identifiability of the data collaboration analysis is investigated.
The proposed method exhibits a non-readily identifiability while maintaining a high recognition performance.
- Score: 7.315551060433141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-source data fusion, in which multiple data sources are jointly analyzed
to obtain improved information, has considerable research attention. For the
datasets of multiple medical institutions, data confidentiality and
cross-institutional communication are critical. In such cases, data
collaboration (DC) analysis by sharing dimensionality-reduced intermediate
representations without iterative cross-institutional communications may be
appropriate. Identifiability of the shared data is essential when analyzing
data including personal information. In this study, the identifiability of the
DC analysis is investigated. The results reveals that the shared intermediate
representations are readily identifiable to the original data for supervised
learning. This study then proposes a non-readily identifiable DC analysis only
sharing non-readily identifiable data for multiple medical datasets including
personal information. The proposed method solves identifiability concerns based
on a random sample permutation, the concept of interpretable DC analysis, and
usage of functions that cannot be reconstructed. In numerical experiments on
medical datasets, the proposed method exhibits a non-readily identifiability
while maintaining a high recognition performance of the conventional DC
analysis. For a hospital dataset, the proposed method exhibits a nine
percentage point improvement regarding the recognition performance over the
local analysis that uses only local dataset.
Related papers
- Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Leveraging text data for causal inference using electronic health records [1.4182510510164876]
This paper presents a unified framework for leveraging text data to support causal inference with electronic health data.
We show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect.
We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited.
arXiv Detail & Related papers (2023-06-09T16:06:02Z) - Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis.
Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z) - Distributed sequential federated learning [0.0]
We develop a data-driven method for efficiently and effectively aggregating valued information by analyzing local data.
We use numerical studies of simulated data and an application to COVID-19 data collected from 32 hospitals in Mexico.
arXiv Detail & Related papers (2023-01-31T21:20:45Z) - Another Use of SMOTE for Interpretable Data Collaboration Analysis [8.143750358586072]
Data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions.
This study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage.
arXiv Detail & Related papers (2022-08-26T06:39:13Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Accuracy and Privacy Evaluations of Collaborative Data Analysis [4.987315310656657]
A collaborative data analysis through sharing dimensionality reduced representations of data has been proposed as a non-model sharing-type federated learning.
This paper analyzes the accuracy and privacy evaluations of this novel framework.
arXiv Detail & Related papers (2021-01-27T00:38:47Z) - Interpretable collaborative data analysis on distributed data [9.434133337939498]
This paper proposes an interpretable non-model sharing collaborative data analysis method as one of the federated learning systems.
By centralizing intermediate representations, which are individually constructed in each party, the proposed method obtains an interpretable model.
Numerical experiments indicate that the proposed method achieves better recognition performance for artificial and real-world problems than individual analysis.
arXiv Detail & Related papers (2020-11-09T13:59:32Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.