GFlowNets for AI-Driven Scientific Discovery
- URL: http://arxiv.org/abs/2302.00615v2
- Date: Tue, 27 Jun 2023 12:10:38 GMT
- Title: GFlowNets for AI-Driven Scientific Discovery
- Authors: Moksh Jain, Tristan Deleu, Jason Hartford, Cheng-Hao Liu, Alex
Hernandez-Garcia, Yoshua Bengio
- Abstract summary: We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
- Score: 74.27219800878304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tackling the most pressing problems for humanity, such as the climate crisis
and the threat of global pandemics, requires accelerating the pace of
scientific discovery. While science has traditionally relied on trial and error
and even serendipity to a large extent, the last few decades have seen a surge
of data-driven scientific discoveries. However, in order to truly leverage
large-scale data sets and high-throughput experimental setups, machine learning
methods will need to be further improved and better integrated in the
scientific discovery pipeline. A key challenge for current machine learning
methods in this context is the efficient exploration of very large search
spaces, which requires techniques for estimating reducible (epistemic)
uncertainty and generating sets of diverse and informative experiments to
perform. This motivated a new probabilistic machine learning framework called
GFlowNets, which can be applied in the modeling, hypotheses generation and
experimental design stages of the experimental science loop. GFlowNets learn to
sample from a distribution given indirectly by a reward function corresponding
to an unnormalized probability, which enables sampling diverse, high-reward
candidates. GFlowNets can also be used to form efficient and amortized Bayesian
posterior estimators for causal models conditioned on the already acquired
experimental data. Having such posterior models can then provide estimators of
epistemic uncertainty and information gain that can drive an experimental
design policy. Altogether, here we will argue that GFlowNets can become a
valuable tool for AI-driven scientific discovery, especially in scenarios of
very large candidate spaces where we have access to cheap but inaccurate
measurements or to expensive but accurate measurements. This is a common
setting in the context of drug and material discovery, which we use as examples
throughout the paper.
Related papers
- Reliable edge machine learning hardware for scientific applications [34.87898436984149]
Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing.
We discuss approaches to developing and validating reliable algorithms at the scientific edge under such strict latency, resource, power, and area requirements.
arXiv Detail & Related papers (2024-06-27T20:45:08Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Multi-Fidelity Active Learning with GFlowNets [65.91555804996203]
We propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates.
Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart.
arXiv Detail & Related papers (2023-06-20T17:43:42Z) - Machine learning enabled experimental design and parameter estimation
for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED)
Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED.
Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z) - Testing Causality in Scientific Modelling Software [0.26388783516590225]
Causal Testing Framework is a framework that uses Causal Inference techniques to establish causal effects from existing data.
We present three case studies covering real-world scientific models, demonstrating how the Causal Testing Framework can infer metamorphic test outcomes.
arXiv Detail & Related papers (2022-09-01T10:57:54Z) - Active Exploration via Experiment Design in Markov Chains [86.41407938210193]
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest.
We propose an algorithm that efficiently selects policies whose measurement allocation converges to the optimal one.
In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
arXiv Detail & Related papers (2022-06-29T00:04:40Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Complete CVDL Methodology for Investigating Hydrodynamic Instabilities [0.49873153106566565]
In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes.
Currently, three main methods are used for understanding such phenomenon - namely analytical models, experiments and simulations.
We claim and demonstrate that a major portion of this research effort could and should be analysed using recent breakthrough advancements in the field of Computer Vision with Deep Learning (CVDL, or Deep Computer-Vision)
Specifically, we focus in this research on one of the most representative instabilities, the Rayleigh-Taylor one, simulate its behaviour and create an open-sourced state-of-the
arXiv Detail & Related papers (2020-04-03T13:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.