A large-scale nanocrystal database with aligned synthesis and properties enabling generative inverse design
- URL: http://arxiv.org/abs/2601.02424v1
- Date: Sun, 04 Jan 2026 07:27:40 GMT
- Title: A large-scale nanocrystal database with aligned synthesis and properties enabling generative inverse design
- Authors: Kai Gu, Yingping Liang, Senliang Peng, Aotian Guo, Haizheng Zhong, Ying Fu,
- Abstract summary: We present the construction of a large-scale, aligned Nanocrystal Synthesis-Property database.<n>Our work bridges the gap between unstructured literature and data-driven byproducts.<n>We also establish a powerful human-AI collaborative paradigm for accelerating nanocrystal discovery.
- Score: 13.264257933986677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The synthesis of nanocrystals has been highly dependent on trial-and-error, due to the complex correlation between synthesis parameters and physicochemical properties. Although deep learning offers a potential methodology to achieve generative inverse design, it is still hindered by the scarcity of high-quality datasets that align nanocrystal synthesis routes with their properties. Here, we present the construction of a large-scale, aligned Nanocrystal Synthesis-Property (NSP) database and demonstrate its capability for generative inverse design. To extract structured synthesis routes and their corresponding product properties from literature, we develop NanoExtractor, a large language model (LLM) enhanced by well-designed augmentation strategies. NanoExtractor is validated against human experts, achieving a weighted average score of 88% on the test set, significantly outperforming chemistry-specialized (3%) and general-purpose LLMs (38%). The resulting NSP database contains nearly 160,000 aligned entries and serves as training data for our NanoDesigner, an LLM for inverse synthesis design. The generative capability of NanoDesigner is validated through the successful design of viable synthesis routes for both well-established PbSe nanocrystals and rarely reported MgF2 nanocrystals. Notably, the model recommends a counter-intuitive, non-stoichiometric precursor ratio (1:1) for MgF2 nanocrystals, which is experimentally confirmed as critical for suppressing byproducts. Our work bridges the gap between unstructured literature and data-driven synthesis, and also establishes a powerful human-AI collaborative paradigm for accelerating nanocrystal discovery.
Related papers
- Predictive Inorganic Synthesis based on Machine Learning using Small Data sets: a case study of size-controlled Cu Nanoparticles [0.0]
Copper nanoparticles (Cu NPs) have a broad applicability, yet their synthesis is sensitive to subtle changes in reaction parameters.<n>This study explores Machine Learning to predict the size of Cu NPs from microwave-assisted polyol synthesis using a small data set of 25 in-house performed syntheses.
arXiv Detail & Related papers (2025-12-18T13:53:08Z) - OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction [63.318434943975255]
We introduce OXtal, a large-scale 100M parameter all-atom diffusion model that learns the conditional joint distribution over intramolecular conformations and periodic packing.<n>By leveraging a large dataset of 600K experimentally validated crystal structures, OXtal achieves orders-of-improvement over prior ab initio machine learning CSP methods.<n> OXtal attains over 80% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
arXiv Detail & Related papers (2025-12-07T20:46:30Z) - Rethinking Molecule Synthesizability with Chain-of-Reaction [47.744071119775676]
We introduce ReaSyn, a generative framework for synthesizable projection.<n>We propose a novel perspective that views synthetic pathways akin to reasoning paths in large language models (LLMs)<n>With the CoR notation, ReaSyn can get dense supervision in every reaction step to explicitly learn chemical reaction rules.
arXiv Detail & Related papers (2025-09-19T15:29:57Z) - Autonomous nanoparticle synthesis by design [32.63291717930695]
We introduce an autonomous approach explicitly targeting synthesis of atomic-scale structures.<n>Our method autonomously designs synthesis protocols by matching real time experimental total scattering (TS) and pair distribution function (PDF) data.<n>We demonstrate this capability at a synchrotron, successfully synthesising two structurally distinct gold NPs.
arXiv Detail & Related papers (2025-05-19T13:19:30Z) - Deep Learning Models for Colloidal Nanocrystal Synthesis [9.520435535546497]
Colloidal synthesis of nanocrystals usually includes complex chemical reactions and multi-step crystallization processes.<n>Here, we developed a deep learning-based nanocrystal synthesis model that correlates synthetic parameters with the final size and shape of target nanocrystals.
arXiv Detail & Related papers (2024-12-14T14:18:59Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - Bespoke Nanoparticle Synthesis and Chemical Knowledge Discovery Via
Autonomous Experimentations [6.544041907979552]
We report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties.
This platform operates in a closed-loop manner between a batch synthesis module of NPs and a UV- Vis spectroscopy module, based on the feedback of the AI optimization modeling.
In addition to the outstanding material developmental efficiency, the analysis of synthetic variables further reveals a novel chemistry involving the effects of citrate in Ag NP synthesis.
arXiv Detail & Related papers (2023-09-01T09:15:04Z) - Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from
Literature with GPT-3 [52.59930033705221]
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
arXiv Detail & Related papers (2023-04-26T22:21:33Z) - Machine-Learning-Optimized Perovskite Nanoplatelet Synthesis [55.41644538483948]
We develop an algorithm to improve the quality of CsPbBr3 nanoplatelets (NPLs) using only 200 total syntheses.
The algorithm can predict the resulting PL emission maxima of the NPL dispersions based on the precursor ratios.
arXiv Detail & Related papers (2022-10-18T11:54:11Z) - Predictive Synthesis of Quantum Materials by Probabilistic Reinforcement
Learning [1.4680035572775534]
We use reinforcement learning to predict optimal synthesis schedules for a prototypical quantum material, semiconducting monolayer MoS$_2$.
The model can be extended to predict profiles for synthesis of complex structures including multi-phase heterostructures.
arXiv Detail & Related papers (2020-09-14T20:50:45Z) - Confidence-guided Lesion Mask-based Simultaneous Synthesis of Anatomic
and Molecular MR Images in Patients with Post-treatment Malignant Gliomas [65.64363834322333]
Confidence Guided SAMR (CG-SAMR) synthesizes data from lesion information to multi-modal anatomic sequences.
module guides the synthesis based on confidence measure about the intermediate results.
experiments on real clinical data demonstrate that the proposed model can perform better than the state-of-theart synthesis methods.
arXiv Detail & Related papers (2020-08-06T20:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.