No Foundations without Foundations -- Why semi-mechanistic models are essential for regulatory biology
- URL: http://arxiv.org/abs/2501.19178v1
- Date: Fri, 31 Jan 2025 14:43:16 GMT
- Title: No Foundations without Foundations -- Why semi-mechanistic models are essential for regulatory biology
- Authors: Luka Kovačević, Thomas Gaudelet, James Opzoomer, Hagen Triendl, John Whittaker, Caroline Uhler, Lindsay Edwards, Jake P. Taylor-King,
- Abstract summary: We argue that genuine "foundation models" of regulatory biology will remain out of reach unless guided by frameworks that integrate mechanistic insight with principled experimental design.
We present one such ground-up, semi-mechanistic framework that unifies perturbation-based experimental designs.
- Score: 5.925258390690544
- License:
- Abstract: Despite substantial efforts, deep learning has not yet delivered a transformative impact on elucidating regulatory biology, particularly in the realm of predicting gene expression profiles. Here, we argue that genuine "foundation models" of regulatory biology will remain out of reach unless guided by frameworks that integrate mechanistic insight with principled experimental design. We present one such ground-up, semi-mechanistic framework that unifies perturbation-based experimental designs across both in vitro and in vivo CRISPR screens, accounting for differentiating and non-differentiating cellular systems. By revealing previously unrecognised assumptions in published machine learning methods, our approach clarifies links with popular techniques such as variational autoencoders and structural causal models. In practice, this framework suggests a modified loss function that we demonstrate can improve predictive performance, and further suggests an error analysis that informs batching strategies. Ultimately, since cellular regulation emerges from innumerable interactions amongst largely uncharted molecular components, we contend that systems-level understanding cannot be achieved through structural biology alone. Instead, we argue that real progress will require a first-principles perspective on how experiments capture biological phenomena, how data are generated, and how these processes can be reflected in more faithful modelling architectures.
Related papers
- GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
The model adheres to the central dogma of molecular biology, accurately generating protein-coding sequences.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of promoter sequences.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models [6.75152379258166]
We present NeuralPLexer3, a flow-based generative model that achieves state-of-the-art prediction accuracy on key biomolecular interaction types.
Examined through newly developed benchmarking strategies, NeuralPLexer3 excels in vital areas that are crucial to structure-based drug design.
arXiv Detail & Related papers (2024-12-14T08:28:45Z) - Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Cross-Entropy Is All You Need To Invert the Data Generating Process [29.94396019742267]
Empirical phenomena suggest that supervised models can learn interpretable factors of variation in a linear fashion.
Recent advances in self-supervised learning have shown that these methods can recover latent structures by inverting the data generating process.
We prove that even in standard classification tasks, models learn representations of ground-truth factors of variation up to a linear transformation.
arXiv Detail & Related papers (2024-10-29T09:03:57Z) - Weakly Supervised Set-Consistency Learning Improves Morphological Profiling of Single-Cell Images [0.6491172192043603]
We propose a set-level consistency learning algorithm, Set-DINO, to improve learned representations of perturbation effects in single-cell images.
We conduct experiments on a large-scale Optical Pooled Screening dataset with more than 5000 genetic perturbations.
arXiv Detail & Related papers (2024-06-08T00:53:30Z) - Network bottlenecks and task structure control the evolution of interpretable learning rules in a foraging agent [0.0]
We study meta-learning via evolutionary optimization of simple reward-modulated plasticity rules in embodied agents.
We show that unconstrained meta-learning leads to the emergence of diverse plasticity rules.
Our findings indicate that the meta-learning of plasticity rules is very sensitive to various parameters, with this sensitivity possibly reflected in the learning rules found in biological networks.
arXiv Detail & Related papers (2024-03-20T14:57:02Z) - Neural Echos: Depthwise Convolutional Filters Replicate Biological
Receptive Fields [56.69755544814834]
We present evidence suggesting that depthwise convolutional kernels are effectively replicating the biological receptive fields observed in the mammalian retina.
We propose a scheme that draws inspiration from the biological receptive fields.
arXiv Detail & Related papers (2024-01-18T18:06:22Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Discovering Latent Causal Variables via Mechanism Sparsity: A New
Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application.
We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.