Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials
- URL: http://arxiv.org/abs/2602.22251v2
- Date: Wed, 04 Mar 2026 23:58:58 GMT
- Title: Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials
- Authors: Alex Morehead, Miruna Cretu, Antonia Panescu, Rishabh Anand, Maurice Weiler, Tynan Perez, Samuel Blau, Steven Farrell, Wahid Bhimji, Anubhav Jain, Hrushikesh Sahasrabuddhe, Pietro Lio, Tommi Jaakkola, Rafael Gomez-Bombarelli, Rex Ying, N. Benjamin Erichson, Michael W. Mahoney,
- Abstract summary: General-purpose 3D chemical modeling encompasses molecules and materials, requiring both generative and predictive capabilities.<n>We introduce Zatom-1, the first end-to-end, fully open-source foundation model that unifies generative and predictive learning of 3D molecules and materials.
- Score: 51.342983349686556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General-purpose 3D chemical modeling encompasses molecules and materials, requiring both generative and predictive capabilities. However, most existing AI approaches are optimized for a single domain (molecules or materials) and a single task (generation or prediction), which limits representation sharing and transfer. We introduce Zatom-1, the first end-to-end, fully open-source foundation model that unifies generative and predictive learning of 3D molecules and materials. Zatom-1 is a Transformer trained with a multimodal flow matching objective that jointly models discrete atom types and continuous 3D geometries. This approach supports scalable pretraining with predictable gains as model capacity increases, while enabling fast and stable sampling. We use joint generative pretraining as a universal initialization for downstream multi-task prediction of properties, energies, and forces. Empirically, Zatom-1 matches or outperforms specialized baselines on both generative and predictive benchmarks, while reducing the generative inference time by more than an order of magnitude. Our experiments demonstrate positive predictive transfer between chemical domains from joint generative pretraining: modeling materials during pretraining improves molecular property prediction accuracy.
Related papers
- Foundation Models for Discovery and Exploration in Chemical Space [57.97784111110166]
MIST is a family of molecular foundation models trained on large unlabeled datasets.<n>We demonstrate the ability of these models to solve real-world problems across chemical space.
arXiv Detail & Related papers (2025-10-20T17:56:01Z) - FlowMol3: Flow Matching for 3D De Novo Small-Molecule Generation [0.0]
FlowMol3 is an open-source, multi-modal flow matching model that advances the state of the art for all-atom, small-molecule generation.<n>Our results highlight simple, transferable strategies for improving the stability and quality of diffusion- and flow-based molecular generative models.
arXiv Detail & Related papers (2025-08-18T05:13:27Z) - All-atom Diffusion Transformers: Unified generative modelling of molecules and materials [11.180029648567658]
All-atom Diffusion Transformer (ADiT) is a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems.<n>ADiT generates realistic and valid molecules as well as materials, obtaining state-of-the-art results on par with molecule and crystal-specific models.
arXiv Detail & Related papers (2025-03-05T23:35:44Z) - Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold [83.18058549195855]
We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities.<n>In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depend on the microenvironment of cells specific to each patient.<n>We propose Meta Flow Matching (MFM), a practical approach to integrate along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations.
arXiv Detail & Related papers (2024-08-26T20:05:31Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - 3D Equivariant Diffusion for Target-Aware Molecule Generation and
Affinity Prediction [9.67574543046801]
The inclusion of 3D structures during targeted drug design shows superior performance to other target-free models.
We develop a 3D equivariant diffusion model to solve the above challenges.
Our model could generate molecules with more realistic 3D structures and better affinities towards the protein targets, and improve binding affinity ranking and prediction without retraining.
arXiv Detail & Related papers (2023-03-06T23:01:43Z) - A Score-based Geometric Model for Molecular Dynamics Simulations [33.158796937777886]
We propose a novel model called ScoreMD to estimate the gradient of the log density of molecular conformations.
With multiple architectural improvements, we outperforms state-of-the-art baselines on MD17 and isomers of C7O2H10.
This research provides new insights into the acceleration of new material and drug discovery.
arXiv Detail & Related papers (2022-04-19T05:13:46Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.