Related papers: Learning Realistic Patterns from Unrealistic Stimuli: Generalization and Data Anonymization

Learning Realistic Patterns from Unrealistic Stimuli: Generalization and Data Anonymization

URL: http://arxiv.org/abs/2009.10007v2
Date: Thu, 9 Dec 2021 11:56:11 GMT
Title: Learning Realistic Patterns from Unrealistic Stimuli: Generalization and Data Anonymization
Authors: Konstantinos Nikolaidis, Stein Kristiansen, Thomas Plagemann, Vera Goebel, Knut Liest{\o}l, Mohan Kankanhalli, Gunn Marit Traaen, Britt {\O}verland, Harriet Akre, Lars Aaker{\o}y, Sigurd Steinshamn
Abstract summary: This work investigates a simple yet unconventional approach for anonymized data synthesis to enable third parties to benefit from such private data. We use sleep monitoring data from both an open and a large closed clinical study and evaluate whether (1) end-users can create and successfully use customized classification models for sleep apnea detection, and (2) the identity of participants in the study is protected.
Score: 0.5091527753265949
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Good training data is a prerequisite to develop useful ML applications. However, in many domains existing data sets cannot be shared due to privacy regulations (e.g., from medical studies). This work investigates a simple yet unconventional approach for anonymized data synthesis to enable third parties to benefit from such private data. We explore the feasibility of learning implicitly from unrealistic, task-relevant stimuli, which are synthesized by exciting the neurons of a trained deep neural network (DNN). As such, neuronal excitation serves as a pseudo-generative model. The stimuli data is used to train new classification models. Furthermore, we extend this framework to inhibit representations that are associated with specific individuals. We use sleep monitoring data from both an open and a large closed clinical study and evaluate whether (1) end-users can create and successfully use customized classification models for sleep apnea detection, and (2) the identity of participants in the study is protected. Extensive comparative empirical investigation shows that different algorithms trained on the stimuli are able generalize successfully on the same task as the original model. However, architectural and algorithmic similarity between new and original models play an important role in performance. For similar architectures, the performance is close to that of using the true data (e.g., Accuracy difference of 0.56\%, Kappa coefficient difference of 0.03-0.04). Further experiments show that the stimuli can to a large extent successfully anonymize participants of the clinical studies.

Related papers

Sleep Brain and Cardiac Activity Predict Cognitive Flexibility and Conceptual Reasoning Using Deep Learning [7.133591513826875]
This study investigates whether deep learning models can predict executive functions, particularly cognitive adaptability and conceptual reasoning from physiological processes during a night's sleep.<n>We introduce CogPSGFormer, a multi-scale convolutional-transformer model designed to process multi-modal polysomnographic data.<n>A thorough evaluation of the CogPSGFormer architecture was conducted to optimize the processing of extended sleep signals.
arXiv Detail & Related papers (2025-05-30T22:21:07Z)
Latent Variable Sequence Identification for Cognitive Models with Neural Bayes Estimation [7.7227297059345466]
We present an approach that extends neural Bayes estimation to learn a direct mapping between experimental data and the targeted latent variable space. Our work underscores that combining recurrent neural networks and simulation-based inference to identify latent variable sequences can enable researchers to access a wider class of cognitive models.
arXiv Detail & Related papers (2024-06-20T21:13:39Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
Improving the Level of Autism Discrimination through GraphRNN Link Prediction [8.103074928419527]
This paper is based on the latter technique, which learns the edge distribution of real brain network through GraphRNN. The experimental results show that the combination of original and synthetic data greatly improves the discrimination of the neural network.
arXiv Detail & Related papers (2022-02-19T06:50:32Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
Handling Data Heterogeneity with Generative Replay in Collaborative Learning for Medical Imaging [21.53220262343254]
We present a novel generative replay strategy to address the challenge of data heterogeneity in collaborative learning methods. A primary model learns the desired task, and an auxiliary "generative replay model" either synthesizes images that closely resemble the input images or helps extract latent variables. The generative replay strategy is flexible to use, can either be incorporated into existing collaborative learning methods to improve their capability of handling data heterogeneity across institutions, or be used as a novel and individual collaborative learning framework (termed FedReplay) to reduce communication cost.
arXiv Detail & Related papers (2021-06-24T17:39:55Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE [10.529943544385585]
We propose a method that integrates key ingredients from latent models and traditional neural encoding models. Our method, pi-VAE, is inspired by recent progress on identifiable variational auto-encoder. We validate pi-VAE using synthetic data, and apply it to analyze neurophysiological datasets from rat hippocampus and macaque motor cortex.
arXiv Detail & Related papers (2020-11-09T22:00:38Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Do Saliency Models Detect Odd-One-Out Targets? New Datasets and Evaluations [15.374430656911498]
We investigate singleton detection, which can be thought of as a canonical example of salience. We show that nearly all saliency algorithms do not adequately respond to singleton targets in synthetic and natural images.
arXiv Detail & Related papers (2020-05-13T20:59:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.