Bayesian Federated Inference for regression models with heterogeneous
multi-center populations
- URL: http://arxiv.org/abs/2402.02898v1
- Date: Mon, 5 Feb 2024 11:10:27 GMT
- Title: Bayesian Federated Inference for regression models with heterogeneous
multi-center populations
- Authors: Marianne A Jonker, Hassan Pazira, Anthony CC Coolen
- Abstract summary: In a regression model, the sample size must be large enough relative to the number of possible predictors.
Pooling data from different data sets collected in different (medical) centers would alleviate this problem, but is often not feasible due to privacy regulation or logistic problems.
An alternative route would be to analyze the local data in the centers separately and combine the statistical inference results with the Bayesian Federated Inference (BFI) methodology.
The aim of this approach is to compute from the inference results in separate centers what would have been found if the statistical analysis was performed on the combined data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To estimate accurately the parameters of a regression model, the sample size
must be large enough relative to the number of possible predictors for the
model. In practice, sufficient data is often lacking, which can lead to
overfitting of the model and, as a consequence, unreliable predictions of the
outcome of new patients. Pooling data from different data sets collected in
different (medical) centers would alleviate this problem, but is often not
feasible due to privacy regulation or logistic problems. An alternative route
would be to analyze the local data in the centers separately and combine the
statistical inference results with the Bayesian Federated Inference (BFI)
methodology. The aim of this approach is to compute from the inference results
in separate centers what would have been found if the statistical analysis was
performed on the combined data. We explain the methodology under homogeneity
and heterogeneity across the populations in the separate centers, and give real
life examples for better understanding. Excellent performance of the proposed
methodology is shown. An R-package to do all the calculations has been
developed and is illustrated in this paper. The mathematical details are given
in the Appendix.
Related papers
- Bayesian Federated Inference for Survival Models [0.0]
In cancer research, overall survival and progression free survival are often analyzed with the Cox model.
Merging data sets from different medical centers may help, but this is not always possible due to strict privacy legislation and logistic difficulties.
Recently, the Bayesian Federated Inference (BFI) strategy for generalized linear models was proposed.
arXiv Detail & Related papers (2024-04-26T15:05:26Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - PQMass: Probabilistic Assessment of the Quality of Generative Models
using Probability Mass Estimation [8.527898482146103]
We propose a comprehensive sample-based method for assessing the quality of generative models.
The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution.
arXiv Detail & Related papers (2024-02-06T19:39:26Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation
for Time Series [49.992908221544624]
Time series data often exhibit numerous missing values, which is the time series imputation task.
Previous deep learning methods have been shown to be effective for time series imputation.
We propose a non-generative time series imputation method that produces accurate imputations with inherent uncertainty.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Logistic Regression Equivalence: A Framework for Comparing Logistic
Regression Models Across Populations [4.518012967046983]
We argue that equivalence testing for a prespecified tolerance level on population differences incentivizes accuracy in the inference.
For diagnosis data, we show examples for equivalent and non-equivalent models.
arXiv Detail & Related papers (2023-03-23T15:12:52Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - A similarity-based Bayesian mixture-of-experts model [0.5156484100374058]
We present a new non-parametric mixture-of-experts model for multivariate regression problems.
Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point.
Posterior inference is performed on the parameters of the mixture as well as the distance metric.
arXiv Detail & Related papers (2020-12-03T18:08:30Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.