The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in
Materials Science
- URL: http://arxiv.org/abs/2210.17484v1
- Date: Mon, 31 Oct 2022 17:11:36 GMT
- Title: The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in
Materials Science
- Authors: Santiago Miret, Kin Long Kelvin Lee, Carmelo Gonzales, Marcel Nassar,
Matthew Spellings
- Abstract summary: The Open MatSci ML Toolkit is a flexible, self-contained, and scalable Python-based framework to apply deep learning models and methods on scientific data.
By publishing and sharing this toolkit with the research community via open-source release, we hope to:.
Lower the entry barrier for new machine learning researchers and practitioners that want to get started with the OpenCatalyst dataset.
- Score: 3.577720074630756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the Open MatSci ML Toolkit: a flexible, self-contained, and
scalable Python-based framework to apply deep learning models and methods on
scientific data with a specific focus on materials science and the OpenCatalyst
Dataset. Our toolkit provides: 1. A scalable machine learning workflow for
materials science leveraging PyTorch Lightning, which enables seamless scaling
across different computation capabilities (laptop, server, cluster) and
hardware platforms (CPU, GPU, XPU). 2. Deep Graph Library (DGL) support for
rapid graph neural network prototyping and development. By publishing and
sharing this toolkit with the research community via open-source release, we
hope to: 1. Lower the entry barrier for new machine learning researchers and
practitioners that want to get started with the OpenCatalyst dataset, which
presently comprises the largest computational materials science dataset. 2.
Enable the scientific community to apply advanced machine learning tools to
high-impact scientific challenges, such as modeling of materials behavior for
clean energy applications. We demonstrate the capabilities of our framework by
enabling three new equivariant neural network models for multiple OpenCatalyst
tasks and arrive at promising results for compute scaling and model
performance.
Related papers
- NNsight and NDIF: Democratizing Access to Foundation Model Internals [48.27939917017487]
NNsight is an open-source Python package with a simple, flexible API that can express interventions on any PyTorch model by building graphs.
NDIF is a collaborative research platform providing researchers access to foundation-scale LLMs via the NNsight API.
arXiv Detail & Related papers (2024-07-18T17:59:01Z) - VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models [78.76009461738299]
We present an open-source toolkit for evaluating large multi-modality models based on PyTorch.
VLMEvalKit implements over 70 different large multi-modality models, including both proprietary APIs and open-source models.
We host OpenVLM Leaderboard to track the progress of multi-modality learning research.
arXiv Detail & Related papers (2024-07-16T13:06:15Z) - Improving Molecular Modeling with Geometric GNNs: an Empirical Study [56.52346265722167]
This paper focuses on the impact of different canonicalization methods, (2) graph creation strategies, and (3) auxiliary tasks, on performance, scalability and symmetry enforcement.
Our findings and insights aim to guide researchers in selecting optimal modeling components for molecular modeling tasks.
arXiv Detail & Related papers (2024-07-11T09:04:12Z) - M$^2$Hub: Unlocking the Potential of Machine Learning for Materials
Discovery [26.099381363351668]
We introduce M$2$Hub, a toolkit for advancing machine learning in materials discovery.
M$2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results.
arXiv Detail & Related papers (2023-06-14T23:06:36Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Flashlight: Enabling Innovation in Tools for Machine Learning [50.63188263773778]
We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems.
We see Flashlight as a tool enabling research that can benefit widely used libraries downstream and bring machine learning and systems researchers closer together.
arXiv Detail & Related papers (2022-01-29T01:03:29Z) - DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life
Science [5.3825788156200565]
We present DGL-LifeSci, an open-source package for deep learning on graphs in life science.
DGL-LifeSci is a python toolkit based on RDKit, PyTorch and Deep Graph Library.
It allows GNN-based modeling on custom datasets for molecular property prediction, reaction prediction and molecule generation.
arXiv Detail & Related papers (2021-06-27T13:27:47Z) - AutoGL: A Library for Automated Graph Learning [67.63587865669372]
We present Automated Graph Learning (AutoGL), the first dedicated library for automated machine learning on graphs.
AutoGL is open-source, easy to use, and flexible to be extended.
We also present AutoGL-light, a lightweight version of AutoGL to facilitate customizing pipelines and enriching applications.
arXiv Detail & Related papers (2021-04-11T10:49:23Z) - Gradient-Based Training and Pruning of Radial Basis Function Networks
with an Application in Materials Physics [0.24792948967354234]
We propose a gradient-based technique for training radial basis function networks with an efficient and scalable open-source implementation.
We derive novel closed-form optimization criteria for pruning the models for continuous as well as binary data.
arXiv Detail & Related papers (2020-04-06T11:32:37Z) - On the impact of selected modern deep-learning techniques to the
performance and celerity of classification models in an experimental
high-energy physics use case [0.0]
Deep learning techniques are tested in the context of a classification problem encountered in the domain of high-energy physics.
The advantages are evaluated in terms of both performance metrics and the time required to train and apply the resulting models.
A new wrapper library for PyTorch called LUMIN is presented, which incorporates all of the techniques studied.
arXiv Detail & Related papers (2020-02-03T12:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.