Related papers: M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery

M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery

URL: http://arxiv.org/abs/2307.05378v1
Date: Wed, 14 Jun 2023 23:06:36 GMT
Title: M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery
Authors: Yuanqi Du, Yingheng Wang, Yining Huang, Jianan Canal Li, Yanqiao Zhu, Tian Xie, Chenru Duan, John M. Gregoire, Carla P. Gomes
Abstract summary: We introduce M$2$Hub, a toolkit for advancing machine learning in materials discovery. M$2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results.
Score: 26.099381363351668
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$^2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$^2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.

Related papers

Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy [54.24356756795849]
Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales.<n>The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access.<n> deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies.
arXiv Detail & Related papers (2025-06-10T03:54:36Z)
Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey [54.40267149907223]
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure.<n>The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges.<n>Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements.
arXiv Detail & Related papers (2025-05-22T08:33:21Z)
MatTools: Benchmarking Large Language Models for Materials Science Tools [5.876786336423598]
MatTools is built on two complementary components: a materials simulation tool question-answer benchmark and a real-world tool-usage benchmark.<n>The QA benchmark comprises 69, QA225 pairs that assess the ability of an LLM to understand materials science tools.<n>The real-world benchmark contains 49 tasks (138 subtasks) requiring the generation of functional Python code for materials property calculations.
arXiv Detail & Related papers (2025-05-16T04:43:05Z)
DARWIN 1.5: Large Language Models as Materials Science Adapted Learners [46.7259033847682]
We propose DARWIN 1.5, the largest open-source large language model tailored for materials science. DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
arXiv Detail & Related papers (2024-12-16T16:51:27Z)
Foundation Model for Composite Materials and Microstructural Analysis [49.1574468325115]
We present a foundation model specifically designed for composite materials. Our model is pre-trained on a dataset of short-fiber composites to learn robust latent features. During transfer learning, the MMAE accurately predicts homogenized stiffness, with an R2 score reaching as high as 0.959 and consistently exceeding 0.91, even when trained on limited data.
arXiv Detail & Related papers (2024-11-10T19:06:25Z)
MatExpert: Decomposing Materials Discovery by Mimicking Human Experts [26.364419690908992]
MatExpert is a novel framework that leverages Large Language Models and contrastive learning to accelerate the discovery and design of new solid-state materials. Inspired by the workflow of human materials design experts, our approach integrates three key stages: retrieval, transition, and generation. MatExpert represents a meaningful advancement in computational material discovery using langauge-based generative models.
arXiv Detail & Related papers (2024-10-26T00:44:54Z)
Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame. ATM outperforms strong video pre-training baselines by 80% on average. We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z)
Agent-based Learning of Materials Datasets from Scientific Literature [0.0]
We develop a chemist AI agent, powered by large language models (LLMs), to create structured datasets from natural language text. Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles.
arXiv Detail & Related papers (2023-12-18T20:29:58Z)
Multimodal Learning for Materials [7.167520424757711]
We introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes.
arXiv Detail & Related papers (2023-11-30T18:35:29Z)
MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling [7.142619575624596]
MatSci ML is a benchmark for modeling MATerials SCIence using Machine Learning (MatSci ML) methods. MatSci ML provides a diverse set of materials systems and properties data for model training and evaluation. In the multi-dataset learning setting, MatSci ML enables researchers to combine observations from multiple datasets to perform joint prediction of common properties.
arXiv Detail & Related papers (2023-09-12T03:08:37Z)
The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science [3.577720074630756]
The Open MatSci ML Toolkit is a flexible, self-contained, and scalable Python-based framework to apply deep learning models and methods on scientific data. By publishing and sharing this toolkit with the research community via open-source release, we hope to:. Lower the entry barrier for new machine learning researchers and practitioners that want to get started with the OpenCatalyst dataset.
arXiv Detail & Related papers (2022-10-31T17:11:36Z)
Kubric: A scalable dataset generator [73.78485189435729]
Kubric is a Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation.
arXiv Detail & Related papers (2022-03-07T18:13:59Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z)
Synthetic Data: Opening the data floodgates to enable faster, more directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data. Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community. Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z)
Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules. We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.