Hybrid Feature- and Similarity-Based Models for Prediction and
Interpretation using Large-Scale Observational Data
- URL: http://arxiv.org/abs/2204.06076v1
- Date: Tue, 12 Apr 2022 20:37:03 GMT
- Title: Hybrid Feature- and Similarity-Based Models for Prediction and
Interpretation using Large-Scale Observational Data
- Authors: Jacqueline K. Kueper, Jennifer Rayner, Daniel J. Lizotte
- Abstract summary: We propose a hybrid feature- and similarity-based model for supervised learning.
The proposed hybrid model is fit by convex optimization with a sparsity-inducing penalty on the kernel portion.
We compared our models to solely feature- and similarity-based approaches using synthetic data and using EHR data to predict risk of loneliness or social isolation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Introduction: Large-scale electronic health record(EHR) datasets often
include simple informative features like patient age and complex data like care
history that are not easily represented as individual features. Such complex
data have the potential to both improve the quality of risk assessment and to
enable a better understanding of causal factors leading to those risks. We
propose a hybrid feature- and similarity-based model for supervised learning
that combines feature and kernel learning approaches to take advantage of rich
but heterogeneous observational data sources to create interpretable models for
prediction and for investigation of causal relationships. Methods: The proposed
hybrid model is fit by convex optimization with a sparsity-inducing penalty on
the kernel portion. Feature and kernel coefficients can be fit sequentially or
simultaneously. We compared our models to solely feature- and similarity-based
approaches using synthetic data and using EHR data from a primary health care
organization to predict risk of loneliness or social isolation. We also present
a new strategy for kernel construction that is suited to high-dimensional
indicator-coded EHR data. Results: The hybrid models had comparable or better
predictive performance than the feature- and kernel-based approaches in both
the synthetic and clinical case studies. The inherent interpretability of the
hybrid model is used to explore client characteristics stratified by kernel
coefficient direction in the clinical case study; we use simple examples to
discuss opportunities and cautions of the two hybrid model forms when causal
interpretations are desired. Conclusion: Hybrid feature- and similarity-based
models provide an opportunity to capture complex, high-dimensional data within
an additive model structure that supports improved prediction and
interpretation relative to simple models and opaque complex models.
Related papers
- zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation [0.0]
"Black swans" have posed a challenge to performance of classical machine learning models.
This article provides an overview of the zGAN model architecture developed for the purpose of generating synthetic data with outlier characteristics.
It shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-a-vis model performance.
arXiv Detail & Related papers (2024-10-28T07:55:11Z) - SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction [3.406882192023597]
Accurate prediction of protein-ligand binding affinity is crucial for drug development.
Traditional methods often fail to accurately model the complex's spatial information.
We propose SPIN, a model that incorporates various inductive biases applicable to this task.
arXiv Detail & Related papers (2024-07-10T08:40:07Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - A Federated Learning-based Industrial Health Prognostics for
Heterogeneous Edge Devices using Matched Feature Extraction [16.337207503536384]
We propose a pioneering FL-based health prognostic model with a feature similarity-matched parameter aggregation algorithm.
We show that the proposed method yields accuracy improvements as high as 44.5% and 39.3% for state-of-health estimation and remaining useful life estimation.
arXiv Detail & Related papers (2023-05-13T07:20:31Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Factor-Augmented Regularized Model for Hazard Regression [1.8021287677546953]
We propose a new model, Factor-Augmented Regularized Model for Hazard Regression (FarmHazard), to perform model selection in high-dimensional data.
We prove model selection consistency and estimation consistency under mild conditions.
We also develop a factor-augmented variable screening procedure to deal with strong correlations in ultra-high dimensional problems.
arXiv Detail & Related papers (2022-10-03T16:35:33Z) - De-Biasing Generative Models using Counterfactual Methods [0.0]
We propose a new decoder based framework named the Causal Counterfactual Generative Model (CCGM)
Our proposed method combines a causal latent space VAE model with specific modification to emphasize causal fidelity.
We explore how better disentanglement of causal learning and encoding/decoding generates higher causal intervention quality.
arXiv Detail & Related papers (2022-07-04T16:53:20Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.