Related papers: BaGGLS: A Bayesian Shrinkage Framework for Interpretable Modeling of Interactions in High-Dimensional Biological Data

BaGGLS: A Bayesian Shrinkage Framework for Interpretable Modeling of Interactions in High-Dimensional Biological Data

URL: http://arxiv.org/abs/2511.15330v1
Date: Wed, 19 Nov 2025 10:48:30 GMT
Title: BaGGLS: A Bayesian Shrinkage Framework for Interpretable Modeling of Interactions in High-Dimensional Biological Data
Authors: Marta S. Lemanczyk, Lucas Kock, Johanna Schlimme, Nadja Klein, Bernhard Y. Renard,
Abstract summary: BaGGLS is a flexible and interpretable probabilistic binary regression model for high-dimensional biological inference involving feature interactions.<n>We show that BaGGLS is a promising approach for uncovering biologically relevant interaction patterns, with potential applicability across a range of high-dimensional tasks in computational biology.
Score: 2.849014311160882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Biological data sets are often high-dimensional, noisy, and governed by complex interactions among sparse signals. This poses major challenges for interpretability and reliable feature selection. Tasks such as identifying motif interactions in genomics exemplify these difficulties, as only a small subset of biologically relevant features (e.g., motifs) are typically active, and their effects are often non-linear and context-dependent. While statistical approaches often result in more interpretable models, deep learning models have proven effective in modeling complex interactions and prediction accuracy, yet their black-box nature limits interpretability. We introduce BaGGLS, a flexible and interpretable probabilistic binary regression model designed for high-dimensional biological inference involving feature interactions. BaGGLS incorporates a Bayesian group global-local shrinkage prior, aligned with the group structure introduced by interaction terms. This prior encourages sparsity while retaining interpretability, helping to isolate meaningful signals and suppress noise. To enable scalable inference, we employ a partially factorized variational approximation that captures posterior skewness and supports efficient learning even in large feature spaces. In extensive simulations, we can show that BaGGLS outperforms the other methods with regard to interaction detection and is many times faster than MCMC sampling under the horseshoe prior. We also demonstrate the usefulness of BaGGLS in the context of interaction discovery from motif scanner outputs and noisy attribution scores from deep learning models. This shows that BaGGLS is a promising approach for uncovering biologically relevant interaction patterns, with potential applicability across a range of high-dimensional tasks in computational biology.

Related papers

Information-theoretic Quantification of High-order Feature Effects in Classification Problems [0.19791587637442676]
We present an information-theoretic extension of the High-order interactions for Feature importance (Hi-Fi) method.<n>Our framework decomposes feature contributions into unique, synergistic, and redundant components.<n>Results indicate that the proposed estimator accurately recovers theoretical and expected findings.
arXiv Detail & Related papers (2025-07-06T11:50:30Z)
Bayesian Hybrid Machine Learning of Gallstone Risk [0.0]
Gallstone disease is a complex, multifactorial condition with significant global health burdens.<n>We propose a hybrid machine learning framework that integrates robust variable selection with advanced interaction detection.<n>This proposed framework not only enhances prediction but also yields actionable insights, offering a valuable support tool for medical research and decision-making.
arXiv Detail & Related papers (2025-06-17T14:19:02Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
Poisson Hierarchical Indian Buffet Processes-With Indications for Microbiome Species Sampling Models [11.64027121881932]
We introduce a new class of species sampling models designed to address the challenges of complex, sparse count data.<n>Our work provides a broadly applicable methodology for hierarchical count modeling in genetics, commerce, and text analysis.
arXiv Detail & Related papers (2025-02-04T01:27:16Z)
Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z)
Latent Variable Sequence Identification for Cognitive Models with Neural Network Estimators [7.7227297059345466]
We present an approach that extends neural Bayes estimation to learn a direct mapping between experimental data and the targeted latent variable space.<n>Our work underscores that combining recurrent neural networks and simulation-based inference to identify latent variable sequences can enable researchers to access a wider class of cognitive models.
arXiv Detail & Related papers (2024-06-20T21:13:39Z)
Cognitive Evolutionary Learning to Select Feature Interactions for Recommender Systems [59.117526206317116]
We show that CELL can adaptively evolve into different models for different tasks and data. Experiments on four real-world datasets demonstrate that CELL significantly outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-29T02:35:23Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
MoReL: Multi-omics Relational Learning [26.484803417186384]
We propose a novel deep Bayesian generative model to efficiently infer a multi-partite graph encoding molecular interactions across heterogeneous views. With such an optimal transport regularization in the deep Bayesian generative model, it not only allows incorporating view-specific side information, but also increases the model flexibility with the distribution-based regularization.
arXiv Detail & Related papers (2022-03-15T02:50:07Z)
Towards Interaction Detection Using Topological Analysis on Neural Networks [55.74562391439507]
In neural networks, any interacting features must follow a strongly weighted connection to common hidden units. We propose a new measure for quantifying interaction strength, based upon the well-received theory of persistent homology. A Persistence Interaction detection(PID) algorithm is developed to efficiently detect interactions.
arXiv Detail & Related papers (2020-10-25T02:15:24Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.