Privacy-Preserving and Lossless Distributed Estimation of
High-Dimensional Generalized Additive Mixed Models
- URL: http://arxiv.org/abs/2210.07723v1
- Date: Fri, 14 Oct 2022 11:41:18 GMT
- Title: Privacy-Preserving and Lossless Distributed Estimation of
High-Dimensional Generalized Additive Mixed Models
- Authors: Daniel Schalk, Bernd Bischl, David R\"ugamer
- Abstract summary: We propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB)
Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces.
We also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.
- Score: 0.9023847175654603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various privacy-preserving frameworks that respect the individual's privacy
in the analysis of data have been developed in recent years. However, available
model classes such as simple statistics or generalized linear models lack the
flexibility required for a good approximation of the underlying data-generating
process in practice. In this paper, we propose an algorithm for a distributed,
privacy-preserving, and lossless estimation of generalized additive mixed
models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB
allows us to reframe the GAMM estimation as a distributed fitting of base
learners using the $L_2$-loss. In order to account for the heterogeneity of
different data location sites, we propose a distributed version of a row-wise
tensor product that allows the computation of site-specific (smooth) effects.
Our adaption of CWB preserves all the important properties of the original
algorithm, such as an unbiased feature selection and the feasibility to fit
models in high-dimensional feature spaces, and yields equivalent model
estimates as CWB on pooled data. Next to a derivation of the equivalence of
both algorithms, we also showcase the efficacy of our algorithm on a
distributed heart disease data set and compare it with state-of-the-art
methods.
Related papers
- Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.
Yet their widespread adoption poses challenges regarding data attribution and interpretability.
In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data.
We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI)
Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - An Empirical Analysis of Fairness Notions under Differential Privacy [3.3748750222488657]
We show how different fairness notions, belonging to distinct classes of statistical fairness criteria, are impacted when one selects a model architecture suitable for DP-SGD.
These findings challenge the understanding that differential privacy will necessarily exacerbate unfairness in deep learning models trained on biased datasets.
arXiv Detail & Related papers (2023-02-06T16:29:50Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - The effectiveness of factorization and similarity blending [0.0]
Collaborative Filtering (CF) is a technique which allows to leverage past users' preferences data to identify behavioural patterns and exploit them to predict custom recommendations.
We show that blending factorization-based and similarity-based approaches can lead to a significant error decrease (-9.4%) on stand-alone models.
We propose a novel extension of a similarity model, SCSR, which consistently reduce the complexity of the original algorithm.
arXiv Detail & Related papers (2022-09-16T13:11:27Z) - Spike-and-Slab Generalized Additive Models and Scalable Algorithms for
High-Dimensional Data [0.0]
We propose hierarchical generalized additive models (GAMs) to accommodate high-dimensional data.
We consider the smoothing penalty for proper shrinkage of curve and separation of smoothing function linear and nonlinear spaces.
Two and deterministic algorithms, EM-Coordinate Descent and EM-Iterative Weighted Least Squares, are developed for different utilities.
arXiv Detail & Related papers (2021-10-27T14:11:13Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.