Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling
- URL: http://arxiv.org/abs/2403.01053v2
- Date: Tue, 5 Mar 2024 07:36:04 GMT
- Title: Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling
- Authors: Jianan Fan, Dongnan Liu, Hang Chang, Heng Huang, Mei Chen, and Weidong
Cai
- Abstract summary: We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
- Score: 53.7117640028211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning holds tremendous promise for transforming the fundamental
practice of scientific discovery by virtue of its data-driven nature. With the
ever-increasing stream of research data collection, it would be appealing to
autonomously explore patterns and insights from observational data for
discovering novel classes of phenotypes and concepts. However, in the
biomedical domain, there are several challenges inherently presented in the
cumulated data which hamper the progress of novel class discovery. The
non-i.i.d. data distribution accompanied by the severe imbalance among
different groups of classes essentially leads to ambiguous and biased semantic
representations. In this work, we present a geometry-constrained probabilistic
modeling treatment to resolve the identified issues. First, we propose to
parameterize the approximated posterior of instance embedding as a marginal von
MisesFisher distribution to account for the interference of distributional
latent bias. Then, we incorporate a suite of critical geometric properties to
impose proper constraints on the layout of constructed embedding space, which
in turn minimizes the uncontrollable risk for unknown class learning and
structuring. Furthermore, a spectral graph-theoretic method is devised to
estimate the number of potential novel classes. It inherits two intriguing
merits compared to existent approaches, namely high computational efficiency
and flexibility for taxonomy-adaptive estimation. Extensive experiments across
various biomedical scenarios substantiate the effectiveness and general
applicability of our method.
Related papers
- Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - From Uncertainty to Clarity: Uncertainty-Guided Class-Incremental Learning for Limited Biomedical Samples via Semantic Expansion [0.0]
We propose a class-incremental learning method under limited samples in the biomedical field.
Our method achieves optimal performance, surpassing state-of-the-art methods by as much as 53.54% in accuracy.
arXiv Detail & Related papers (2024-09-12T05:22:45Z) - Heterogeneous Transfer Learning for Building High-Dimensional Generalized Linear Models with Disparate Datasets [0.0]
We describe a transfer learning approach for building high-dimensional generalized linear models.
We use data from a main study with detailed information on all predictors and an external, potentially much larger, study that has a more limited set of predictors.
arXiv Detail & Related papers (2023-12-20T06:11:59Z) - TriSig: Assessing the statistical significance of triclusters [2.064612766965483]
This work proposes a statistical frame to assess the probability of patterns in tensor data to deviate from null expectations.
A comprehensive discussion on binomial testing for false positive discoveries is entailed.
Results gathered from the application of state-of-the-art triclustering algorithms over distinct real-world case studies in biochemical and biotechnological domains.
arXiv Detail & Related papers (2023-06-01T13:08:54Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - A Deep Variational Approach to Clustering Survival Data [5.871238645229228]
We introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting.
Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times.
arXiv Detail & Related papers (2021-06-10T14:10:25Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.