Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for
Chronic Disease Prediction
- URL: http://arxiv.org/abs/2309.03386v1
- Date: Wed, 6 Sep 2023 22:16:58 GMT
- Title: Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for
Chronic Disease Prediction
- Authors: Yang Wu, Xurui Li, Xuhong Zhang, Yangyang Kang, Changlong Sun and
Xiaozhong Liu
- Abstract summary: We present a novel Positive-Unlabeled Learning Tree (PUtree) algorithm.
PUtree is designed to take into account communities such as different age or income brackets, in tasks of chronic disease prediction.
We demonstrate the superior performance of PUtree as well as its variants on two benchmarks and a new diabetes-prediction dataset.
- Score: 35.76481037888834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Positive-Unlabeled (PU) Learning is a challenge presented by binary
classification problems where there is an abundance of unlabeled data along
with a small number of positive data instances, which can be used to address
chronic disease screening problem. State-of-the-art PU learning methods have
resulted in the development of various risk estimators, yet they neglect the
differences among distinct populations. To address this issue, we present a
novel Positive-Unlabeled Learning Tree (PUtree) algorithm. PUtree is designed
to take into account communities such as different age or income brackets, in
tasks of chronic disease prediction. We propose a novel approach for binary
decision-making, which hierarchically builds community-based PU models and then
aggregates their deliverables. Our method can explicate each PU model on the
tree for the optimized non-leaf PU node splitting. Furthermore, a mask-recovery
data augmentation strategy enables sufficient training of the model in
individual communities. Additionally, the proposed approach includes an
adversarial PU risk estimator to capture hierarchical PU-relationships, and a
model fusion network that integrates data from each tree path, resulting in
robust binary classification results. We demonstrate the superior performance
of PUtree as well as its variants on two benchmarks and a new
diabetes-prediction dataset.
Related papers
- Integrating Genomics into Multimodal EHR Foundation Models [56.31910745104141]
This paper introduces an innovative EHR foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality.<n>The framework aims to learn complex relationships between clinical data and genetic predispositions.<n>This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies.
arXiv Detail & Related papers (2025-10-24T15:56:40Z) - Enhanced Survival Trees [5.176259250675077]
We introduce a new survival tree method for censored failure time data that incorporates three key advancements over traditional approaches.<n>First, we develop a more computationally efficient splitting procedure that effectively mitigates the end-cut preference problem.<n>Second, we present a novel framework for determining tree structures through fused regularization.<n>Third, we address inference by constructing valid confidence intervals for median survival times within the subgroups identified by the final tree.
arXiv Detail & Related papers (2025-09-23T00:54:45Z) - Model-free algorithms for fast node clustering in SBM type graphs and application to social role inference in animals [26.41190755089919]
We propose a novel family of model-free algorithms for node clustering and parameter inference in graphs generated from the Block Model (SBM)<n>We benchmark our methods against state-of-the-art techniques, demonstrating significantly faster computation times with the lower order of estimation error.<n>We validate the practical relevance of our algorithms by applying them to empirical network data from behavioral ecology.
arXiv Detail & Related papers (2025-09-19T13:57:17Z) - FedGA-Tree: Federated Decision Tree using Genetic Algorithm [11.955062839855334]
We introduce Genetic Algorithm to facilitate the construction of personalized decision trees.<n>Our method surpasses decision trees trained solely on local data and a benchmark algorithm.
arXiv Detail & Related papers (2025-06-09T19:39:22Z) - Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data.
We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z) - Convergence Behavior of an Adversarial Weak Supervision Method [10.409652277630133]
Weak Supervision is a paradigm subsuming subareas of machine learning.
By using labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated.
Two approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation.
arXiv Detail & Related papers (2024-05-25T02:33:17Z) - Causal Discovery with Generalized Linear Models through Peeling
Algorithms [7.859708910171316]
Article presents a novel method for causal discovery with generalized structural equation models.
It provides statistical guarantees for accurately discovering parent-child relationships via the peeling algorithms.
It also demonstrates an application to Alzheimer's disease.
arXiv Detail & Related papers (2023-10-25T15:12:24Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing
Imputation Perspective [5.64530854079352]
We address imputation of missing data by modeling the joint distribution of multi-modal data.
Motivated by partial bidirectional generative adversarial net (PBiGAN), we propose a new Conditional PBiGAN (C-PBiGAN) method.
C-PBiGAN achieves significant improvements in lung cancer risk estimation compared with representative imputation methods.
arXiv Detail & Related papers (2021-07-25T20:15:16Z) - Group Testing with a Graph Infection Spread Model [61.48558770435175]
Infection spreads via connections between individuals and this results in a probabilistic cluster formation structure as well as a non-i.i.d. infection status for individuals.
We propose a class of two-step sampled group testing algorithms where we exploit the known probabilistic infection spread model.
Our results imply that, by exploiting information on the connections of individuals, group testing can be used to reduce the number of required tests significantly even when infection rate is high.
arXiv Detail & Related papers (2021-01-14T18:51:32Z) - Amortized Probabilistic Detection of Communities in Graphs [39.56798207634738]
We propose a simple framework for amortized community detection.
We combine the expressive power of GNNs with recent methods for amortized clustering.
We evaluate several models from our framework on synthetic and real datasets.
arXiv Detail & Related papers (2020-10-29T16:18:48Z) - Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel [17.35449041036449]
Mixed-Membership Blockmodel(MMSB) is proposed as one of the state-of-the-art Bayesian methods suitable for learning the complex hidden structure underlying the network data.
Our model performs entity-based clustering to capture the community information for entities and linkage-based clustering to derive the group information for links simultaneously.
By integrating the community structure with the group compatibility matrix we derive a generalized version of MMSB.
arXiv Detail & Related papers (2020-01-17T22:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.