Predictive Inorganic Synthesis based on Machine Learning using Small Data sets: a case study of size-controlled Cu Nanoparticles
- URL: http://arxiv.org/abs/2512.16545v1
- Date: Thu, 18 Dec 2025 13:53:08 GMT
- Title: Predictive Inorganic Synthesis based on Machine Learning using Small Data sets: a case study of size-controlled Cu Nanoparticles
- Authors: Brent Motmans, Digvijay Ghogare, Thijs G. I. van Wijk, An Hardy, Danny E. P. Vanpoucke,
- Abstract summary: Copper nanoparticles (Cu NPs) have a broad applicability, yet their synthesis is sensitive to subtle changes in reaction parameters.<n>This study explores Machine Learning to predict the size of Cu NPs from microwave-assisted polyol synthesis using a small data set of 25 in-house performed syntheses.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Copper nanoparticles (Cu NPs) have a broad applicability, yet their synthesis is sensitive to subtle changes in reaction parameters. This sensitivity, combined with the time- and resource-intensive nature of experimental optimization, poses a major challenge in achieving reproducible and size-controlled synthesis. While Machine Learning (ML) shows promise in materials research, its application is often limited by scarcity of large high-quality experimental data sets. This study explores ML to predict the size of Cu NPs from microwave-assisted polyol synthesis using a small data set of 25 in-house performed syntheses. Latin Hypercube Sampling is used to efficiently cover the parameter space while creating the experimental data set. Ensemble regression models, built with the AMADEUS framework, successfully predict particle sizes with high accuracy ($R^2 = 0.74$), outperforming classical statistical approaches ($R^2 = 0.60$). Overall, this study highlights that, for lab-scale synthesis optimization, high-quality small datasets combined with classical, interpretable ML models outperform traditional statistical methods and are fully sufficient for quantitative synthesis prediction. This approach provides a sustainable and experimentally realistic pathway toward data-driven inorganic synthesis design.
Related papers
- Scaling Laws of Synthetic Data for Language Models [125.41600201811417]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets.<n>Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z) - Predicting and Accelerating Nanomaterials Synthesis Using Machine Learning Featurization [0.0]
We automate and generalize feature extraction of reflection high-energy electron diffraction data with machine learning.
We establish quantitatively predictive relationships in small sets (10) of expert-labeled data, saving significant time on subsequently grown samples.
These predictions provide guidance to avoid doomed trials, reduce follow-on characterization, and improve control resolution for materials synthesis.
arXiv Detail & Related papers (2024-09-12T14:03:55Z) - Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance [16.047084318753377]
Imbalanced classification and spurious correlation are common challenges in data science and machine learning.<n>Recent advances have proposed leveraging the flexibility and generative capabilities of large language models to generate synthetic samples.<n>This article develops novel theoretical foundations to systematically study the roles of synthetic samples in addressing imbalanced classification and spurious correlation.
arXiv Detail & Related papers (2024-06-05T21:24:26Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Determination of droplet size from wide-angle light scattering image
data using convolutional neural networks [0.0]
We introduce a fully automatic machine learning-based approach that employs convolutional neural networks (CNNs) in order to streamline the droplet sizing process.
We consider WALS data from an ethanol spray flame process at various heights above the burner surface (HABs)
The models are trained and cross-validated on a large dataset comprising nearly 35000 WALS images.
arXiv Detail & Related papers (2023-11-03T18:05:47Z) - Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models [69.76066070227452]
*Data Synthesis* is a promising way to train a small model with very little labeled data.
We propose *Synthesis Step by Step* (**S3**), a data synthesis framework that shrinks this distribution gap.
Our approach improves the performance of a small model by reducing the gap between the synthetic dataset and the real data.
arXiv Detail & Related papers (2023-10-20T17:14:25Z) - Bespoke Nanoparticle Synthesis and Chemical Knowledge Discovery Via
Autonomous Experimentations [6.544041907979552]
We report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties.
This platform operates in a closed-loop manner between a batch synthesis module of NPs and a UV- Vis spectroscopy module, based on the feedback of the AI optimization modeling.
In addition to the outstanding material developmental efficiency, the analysis of synthetic variables further reveals a novel chemistry involving the effects of citrate in Ag NP synthesis.
arXiv Detail & Related papers (2023-09-01T09:15:04Z) - Machine learning enabled experimental design and parameter estimation
for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED)
Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED.
Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real
Novel View Synthesis via Contrastive Learning [102.46382882098847]
We first investigate the effects of synthetic data in synthetic-to-real novel view synthesis.
We propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints.
Our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS.
arXiv Detail & Related papers (2023-03-20T12:06:14Z) - Synthetic data enable experiments in atomistic machine learning [0.0]
We demonstrate the use of a large dataset labelled with per-atom energies from an existing ML potential model.
The cheapness of this process, compared to the quantum-mechanical ground truth, allows us to generate millions of datapoints.
We show that learning synthetic data labels can be a useful pre-training task for subsequent fine-tuning on small datasets.
arXiv Detail & Related papers (2022-11-29T18:17:24Z) - Predictive Synthesis of Quantum Materials by Probabilistic Reinforcement
Learning [1.4680035572775534]
We use reinforcement learning to predict optimal synthesis schedules for a prototypical quantum material, semiconducting monolayer MoS$_2$.
The model can be extended to predict profiles for synthesis of complex structures including multi-phase heterostructures.
arXiv Detail & Related papers (2020-09-14T20:50:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.