The Open Catalyst 2020 (OC20) Dataset and Community Challenges
- URL: http://arxiv.org/abs/2010.09990v5
- Date: Fri, 24 Sep 2021 14:09:17 GMT
- Title: The Open Catalyst 2020 (OC20) Dataset and Community Challenges
- Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril,
Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb
Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon,
Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
- Abstract summary: Catalyst discovery and optimization is key to solving many societal and energy challenges.
It remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbates.
- Score: 36.556154866045894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Catalyst discovery and optimization is key to solving many societal and
energy challenges including solar fuels synthesis, long-term energy storage,
and renewable fertilizer production. Despite considerable effort by the
catalysis community to apply machine learning models to the computational
catalyst discovery process, it remains an open challenge to build models that
can generalize across both elemental compositions of surfaces and adsorbate
identity/configurations, perhaps because datasets have been smaller in
catalysis than related fields. To address this we developed the OC20 dataset,
consisting of 1,281,040 Density Functional Theory (DFT) relaxations
(~264,890,000 single point evaluations) across a wide swath of materials,
surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We
supplemented this dataset with randomly perturbed structures, short timescale
molecular dynamics, and electronic structure analyses. The dataset comprises
three central tasks indicative of day-to-day catalyst modeling and comes with
pre-defined train/validation/test splits to facilitate direct comparisons with
future model development efforts. We applied three state-of-the-art graph
neural network models (CGCNN, SchNet, Dimenet++) to each of these tasks as
baseline demonstrations for the community to build on. In almost every task, no
upper limit on model size was identified, suggesting that even larger models
are likely to improve on initial results. The dataset and baseline models are
both provided as open resources, as well as a public leader board to encourage
community contributions to solve these important tasks.
Related papers
- A Foundation Model for the Solar Dynamics Observatory [2.63089646549647]
SDO-FM is a foundation model using data from NASA's Solar Dynamics Observatory (SDO) spacecraft.
This paper marks release of our pretrained models and embedding datasets, available to the community on Hugging Face and sdofm.org.
arXiv Detail & Related papers (2024-10-03T14:36:32Z) - Efficient Materials Informatics between Rockets and Electrons [0.0]
This dissertation focuses on the design of functionally graded materials (FGMs) incorporating ultra-high temperature refractory high entropy alloys (RHEAs)
At the atomistic level, a data ecosystem optimized for machine learning (ML) from over 4.5 million relaxed structures, called MPDD, is used to inform experimental observations and improve thermodynamic models.
The resulting multi-level discovery infrastructure is highly generalizable as it focuses on encoding problems to solve them easily rather than looking for an existing solution.
arXiv Detail & Related papers (2024-07-05T17:03:26Z) - Lightweight Geometric Deep Learning for Molecular Modelling in Catalyst Discovery [0.0]
Open Catalyst Project aims to apply advances in graph neural networks (GNNs) to accelerate progress in catalyst discovery.
By implementing robust design patterns like geometric and symmetric message passing, we were able to train a GNN model that reached a MAE of 0.0748 in predicting the per-atom forces of adsorbate-surface interactions.
arXiv Detail & Related papers (2024-04-05T17:13:51Z) - Foundation Models for Generalist Geospatial Artificial Intelligence [3.7002058945990415]
This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive data.
We have utilized this framework to create Prithvi, a transformer-based foundational model pre-trained on more than 1TB of multispectral satellite imagery.
arXiv Detail & Related papers (2023-10-28T10:19:55Z) - Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models [69.76066070227452]
*Data Synthesis* is a promising way to train a small model with very little labeled data.
We propose *Synthesis Step by Step* (**S3**), a data synthesis framework that shrinks this distribution gap.
Our approach improves the performance of a small model by reducing the gap between the synthetic dataset and the real data.
arXiv Detail & Related papers (2023-10-20T17:14:25Z) - PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated
Catalyst Design [102.9593507372373]
Catalyst materials play a crucial role in the electrochemical reactions involved in industrial processes.
Machine learning holds the potential to efficiently model materials properties from large amounts of data.
We propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy.
arXiv Detail & Related papers (2022-11-22T05:24:30Z) - The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide
Electrocatalysis [9.9765107020148]
General machine learning potential that spans the chemical space of oxide materials is still out of reach.
Open Catalyst 2022(OC22) dataset consists of 62,521 Density Functional Theory (DFT) relaxations across a range of oxide materials.
We study whether combining datasets leads to better results, even if they contain different materials or adsorbates.
arXiv Detail & Related papers (2022-06-17T17:54:10Z) - Learning Large-scale Subsurface Simulations with a Hybrid Graph Network
Simulator [57.57321628587564]
We introduce Hybrid Graph Network Simulator (HGNS) for learning reservoir simulations of 3D subsurface fluid flows.
HGNS consists of a subsurface graph neural network (SGNN) to model the evolution of fluid flows, and a 3D-U-Net to model the evolution of pressure.
Using an industry-standard subsurface flow dataset (SPE-10) with 1.1 million cells, we demonstrate that HGNS is able to reduce the inference time up to 18 times compared to standard subsurface simulators.
arXiv Detail & Related papers (2022-06-15T17:29:57Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.