Related papers: Re-experiment Smart: a Novel Method to Enhance Data-driven Prediction of Mechanical Properties of Epoxy Polymers

Re-experiment Smart: a Novel Method to Enhance Data-driven Prediction of Mechanical Properties of Epoxy Polymers

URL: http://arxiv.org/abs/2506.01994v1
Date: Mon, 19 May 2025 04:42:18 GMT
Title: Re-experiment Smart: a Novel Method to Enhance Data-driven Prediction of Mechanical Properties of Epoxy Polymers
Authors: Wanshan Cui, Yejin Jeong, Inwook Song, Gyuri Kim, Minsang Kwon, Donghun Lee,
Abstract summary: We propose a novel approach to enhance dataset quality efficiently by integrating multi-algorithm outlier detection with selective re-experimentation of unreliable outlier cases.<n>Our method reliably reduces prediction error (RMSE) and significantly improves accuracy with minimal additional experimental work, requiring only about 5% of the dataset to be re-measured.
Score: 2.1389836877212347
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate prediction of polymer material properties through data-driven approaches greatly accelerates novel material development by reducing redundant experiments and trial-and-error processes. However, inevitable outliers in empirical measurements can severely skew machine learning results, leading to erroneous prediction models and suboptimal material designs. To address this limitation, we propose a novel approach to enhance dataset quality efficiently by integrating multi-algorithm outlier detection with selective re-experimentation of unreliable outlier cases. To validate the empirical effectiveness of the approach, we systematically construct a new dataset containing 701 measurements of three key mechanical properties: glass transition temperature ($T_g$), tan $\delta$ peak, and crosslinking density ($v_{c}$). To demonstrate its general applicability, we report the performance improvements across multiple machine learning models, including Elastic Net, SVR, Random Forest, and TPOT, to predict the three key properties. Our method reliably reduces prediction error (RMSE) and significantly improves accuracy with minimal additional experimental work, requiring only about 5% of the dataset to be re-measured.These findings highlight the importance of data quality enhancement in achieving reliable machine learning applications in polymer science and present a scalable strategy for improving predictive reliability in materials science.

Related papers

Uncertainty-Aware Machine-Learning Framework for Predicting Dislocation Plasticity and Stress-Strain Response in FCC Alloys [9.066691897904875]
Machine learning has significantly advanced the understanding and application of structural materials.<n>This study presents a comprehensive methodology utilizing a mixed density network (MDN) model.<n>The incorporation of statistical parameters of those predicted distributions into a dislocation-mediated plasticity model allows for accurate stress-strain predictions.
arXiv Detail & Related papers (2025-06-25T21:18:14Z)
Leveraging Large Language Models to Address Data Scarcity in Machine Learning: Applications in Graphene Synthesis [0.0]
Machine learning in materials science faces challenges due to limited experimental data.<n>We propose strategies that utilize large language models (LLMs) to enhance machine learning performance.
arXiv Detail & Related papers (2025-03-06T16:04:01Z)
Data-driven tool wear prediction in milling, based on a process-integrated single-sensor approach [1.6574413179773764]
This study explores data-driven methods, in particular deep learning, for tool wear prediction.<n>It investigates the transferability of predictive models using minimal training data, validated across two processes.<n>The ConvNeXt model has an exceptional performance, achieving 99.1% accuracy in identifying tool wear.
arXiv Detail & Related papers (2024-12-27T23:10:32Z)
Transfer Learning for Deep Learning-based Prediction of Lattice Thermal Conductivity [0.0]
We study the impact of transfer learning on the precision and generalizability of a deep learning model (ParAIsite)<n>We show that a much greater improvement is obtained when first fine-tuning it on a large datasets of low-quality approximations of lattice thermal conductivity (LTC)<n>The promising results pave the way towards a greater ability to explore large databases in search of low thermal conductivity materials.
arXiv Detail & Related papers (2024-11-27T11:57:58Z)
An Investigation on Machine Learning Predictive Accuracy Improvement and Uncertainty Reduction using VAE-based Data Augmentation [2.517043342442487]
Deep generative learning uses certain ML models to learn the underlying distribution of existing data and generate synthetic samples that resemble the real data. In this study, our objective is to evaluate the effectiveness of data augmentation using variational autoencoder (VAE)-based deep generative models. We investigated whether the data augmentation leads to improved accuracy in the predictions of a deep neural network (DNN) model trained using the augmented data.
arXiv Detail & Related papers (2024-10-24T18:15:48Z)
Learning Long-Horizon Predictions for Quadrotor Dynamics [48.08477275522024]
We study the key design choices for efficiently learning long-horizon prediction dynamics for quadrotors. We show that sequential modeling techniques showcase their advantage in minimizing compounding errors compared to other types of solutions. We propose a novel decoupled dynamics learning approach, which further simplifies the learning process while also enhancing the approach modularity.
arXiv Detail & Related papers (2024-07-17T19:06:47Z)
Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options. The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z)
Retrosynthesis prediction enhanced by in-silico reaction data augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation. On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
MetaRF: Differentiable Random Forest for Reaction Yield Prediction with a Few Trails [58.47364143304643]
In this paper, we focus on the reaction yield prediction problem. We first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method.
arXiv Detail & Related papers (2022-08-22T06:40:13Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.