Data augmentation for machine learning of chemical process flowsheets
- URL: http://arxiv.org/abs/2302.03379v1
- Date: Tue, 7 Feb 2023 10:35:24 GMT
- Title: Data augmentation for machine learning of chemical process flowsheets
- Authors: Lukas Schulze Balhorn, Edwin Hirtreiter, Lynn Luderer, Artur M.
Schweidtmann
- Abstract summary: We show that proposed data augmentation improves the performance of artificial intelligence-based process design models.
In our case study flowsheet data augmentation improved the prediction uncertainty of the flowsheet autocompletion model by 14.7%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence has great potential for accelerating the design and
engineering of chemical processes. Recently, we have shown that
transformer-based language models can learn to auto-complete chemical process
flowsheets using the SFILES 2.0 string notation. Also, we showed that language
translation models can be used to translate Process Flow Diagrams (PFDs) into
Process and Instrumentation Diagrams (P&IDs). However, artificial intelligence
methods require big data and flowsheet data is currently limited. To mitigate
this challenge of limited data, we propose a new data augmentation methodology
for flowsheet data that is represented in the SFILES 2.0 notation. We show that
the proposed data augmentation improves the performance of artificial
intelligence-based process design models. In our case study flowsheet data
augmentation improved the prediction uncertainty of the flowsheet
autocompletion model by 14.7%. In the future, our flowsheet data augmentation
can be used for other machine learning algorithms on chemical process
flowsheets that are based on SFILES notation.
Related papers
- Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence [0.0]
Control structure design is an important but tedious step in P&ID development.
Generative artificial intelligence (AI) promises to reduce P&ID development time by supporting engineers.
We propose the Graph-to-SFILES model, a generative AI method to predict control structures from flowsheet topologies.
arXiv Detail & Related papers (2024-11-30T15:30:11Z) - Recent Advances on Machine Learning for Computational Fluid Dynamics: A Survey [51.87875066383221]
This paper introduces fundamental concepts, traditional methods, and benchmark datasets, then examine the various roles Machine Learning plays in improving CFD.
We highlight real-world applications of ML for CFD in critical scientific and engineering disciplines, including aerodynamics, combustion, atmosphere & ocean science, biology fluid, plasma, symbolic regression, and reduced order modeling.
We draw the conclusion that ML is poised to significantly transform CFD research by enhancing simulation accuracy, reducing computational time, and enabling more complex analyses of fluid dynamics.
arXiv Detail & Related papers (2024-08-22T07:33:11Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Toward autocorrection of chemical process flowsheets using large
language models [0.0]
We propose a novel generative AI methodology for identifying errors in flowsheets and suggesting corrections to the user.
The input to the model is a potentially erroneous flowsheet and the output of the model are suggestions for a corrected flowsheet.
The model achieves a top-1 accuracy of 80% and a top-5 accuracy of 84% on an independent test dataset of synthetically generated flowsheets.
arXiv Detail & Related papers (2023-12-05T16:39:41Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Learning from flowsheets: A generative transformer model for
autocompletion of flowsheets [0.0]
We represent flowsheets as strings using the text-based SFILES 2.0 notation.
We learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model.
arXiv Detail & Related papers (2022-08-01T13:43:58Z) - Surrogate Modelling for Injection Molding Processes using Machine
Learning [0.23090185577016442]
Injection molding is one of the most popular manufacturing methods for the modeling of complex plastic objects.
We propose a baseline for a data processing pipeline that includes the extraction of data from Moldflow simulation projects.
We evaluate machine learning models for fill time and deflection distribution prediction and provide baseline values of MSE and RMSE metrics.
arXiv Detail & Related papers (2021-07-30T12:13:52Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.