M$^2$Hub: Unlocking the Potential of Machine Learning for Materials
Discovery
- URL: http://arxiv.org/abs/2307.05378v1
- Date: Wed, 14 Jun 2023 23:06:36 GMT
- Title: M$^2$Hub: Unlocking the Potential of Machine Learning for Materials
Discovery
- Authors: Yuanqi Du, Yingheng Wang, Yining Huang, Jianan Canal Li, Yanqiao Zhu,
Tian Xie, Chenru Duan, John M. Gregoire, Carla P. Gomes
- Abstract summary: We introduce M$2$Hub, a toolkit for advancing machine learning in materials discovery.
M$2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results.
- Score: 26.099381363351668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce M$^2$Hub, a toolkit for advancing machine learning in materials
discovery. Machine learning has achieved remarkable progress in modeling
molecular structures, especially biomolecules for drug discovery. However, the
development of machine learning approaches for modeling materials structures
lag behind, which is partly due to the lack of an integrated platform that
enables access to diverse tasks for materials discovery. To bridge this gap,
M$^2$Hub will enable easy access to materials discovery tasks, datasets,
machine learning methods, evaluations, and benchmark results that cover the
entire workflow. Specifically, the first release of M$^2$Hub focuses on three
key stages in materials discovery: virtual screening, inverse design, and
molecular simulation, including 9 datasets that covers 6 types of materials
with 56 tasks across 8 types of material properties. We further provide 2
synthetic datasets for the purpose of generative tasks on materials. In
addition to random data splits, we also provide 3 additional data partitions to
reflect the real-world materials discovery scenarios. State-of-the-art machine
learning methods (including those are suitable for materials structures but
never compared in the literature) are benchmarked on representative tasks. Our
codes and library are publicly available at https://github.com/yuanqidu/M2Hub.
Related papers
- MatExpert: Decomposing Materials Discovery by Mimicking Human Experts [26.364419690908992]
MatExpert is a novel framework that leverages Large Language Models and contrastive learning to accelerate the discovery and design of new solid-state materials.
Inspired by the workflow of human materials design experts, our approach integrates three key stages: retrieval, transition, and generation.
MatExpert represents a meaningful advancement in computational material discovery using langauge-based generative models.
arXiv Detail & Related papers (2024-10-26T00:44:54Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Agent-based Learning of Materials Datasets from Scientific Literature [0.0]
We develop a chemist AI agent, powered by large language models (LLMs), to create structured datasets from natural language text.
Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles.
arXiv Detail & Related papers (2023-12-18T20:29:58Z) - Multimodal Learning for Materials [7.167520424757711]
We introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials.
We demonstrate our framework's potential using data from the Materials Project database on multiple axes.
arXiv Detail & Related papers (2023-11-30T18:35:29Z) - MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials
Modeling [7.142619575624596]
MatSci ML is a benchmark for modeling MATerials SCIence using Machine Learning (MatSci ML) methods.
MatSci ML provides a diverse set of materials systems and properties data for model training and evaluation.
In the multi-dataset learning setting, MatSci ML enables researchers to combine observations from multiple datasets to perform joint prediction of common properties.
arXiv Detail & Related papers (2023-09-12T03:08:37Z) - The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in
Materials Science [3.577720074630756]
The Open MatSci ML Toolkit is a flexible, self-contained, and scalable Python-based framework to apply deep learning models and methods on scientific data.
By publishing and sharing this toolkit with the research community via open-source release, we hope to:.
Lower the entry barrier for new machine learning researchers and practitioners that want to get started with the OpenCatalyst dataset.
arXiv Detail & Related papers (2022-10-31T17:11:36Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Kubric: A scalable dataset generator [73.78485189435729]
Kubric is a Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines.
We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation.
arXiv Detail & Related papers (2022-03-07T18:13:59Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules.
We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.