GPT-4 as an interface between researchers and computational software:
improving usability and reproducibility
- URL: http://arxiv.org/abs/2310.11458v1
- Date: Wed, 4 Oct 2023 14:25:39 GMT
- Title: GPT-4 as an interface between researchers and computational software:
improving usability and reproducibility
- Authors: Juan C. Verduzco, Ethan Holbrook, and Alejandro Strachan
- Abstract summary: We focus on a widely used software for molecular dynamics simulations.
We quantify the usefulness of input files generated by GPT-4 from task descriptions in English.
We find that GPT-4 can generate correct and ready-to-use input files for relatively simple tasks.
In addition, GPT-4's description of computational tasks from input files can be tuned from a detailed set of step-by-step instructions to a summary description appropriate for publications.
- Score: 44.99833362998488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are playing an increasingly important role in
science and engineering. For example, their ability to parse and understand
human and computer languages makes them powerful interpreters and their use in
applications like code generation are well-documented. We explore the ability
of the GPT-4 LLM to ameliorate two major challenges in computational materials
science: i) the high barriers for adoption of scientific software associated
with the use of custom input languages, and ii) the poor reproducibility of
published results due to insufficient details in the description of simulation
methods. We focus on a widely used software for molecular dynamics simulations,
the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), and
quantify the usefulness of input files generated by GPT-4 from task
descriptions in English and its ability to generate detailed descriptions of
computational tasks from input files. We find that GPT-4 can generate correct
and ready-to-use input files for relatively simple tasks and useful starting
points for more complex, multi-step simulations. In addition, GPT-4's
description of computational tasks from input files can be tuned from a
detailed set of step-by-step instructions to a summary description appropriate
for publications. Our results show that GPT-4 can reduce the number of routine
tasks performed by researchers, accelerate the training of new users, and
enhance reproducibility.
Related papers
- In Context Learning and Reasoning for Symbolic Regression with Large Language Models [0.0]
Large Language Models (LLMs) are transformer-based machine learning models.
We show how GPT-4 can perform symbolic regression on equations from datasets.
This approach does not outperform established SR programs where target equations are more complex.
arXiv Detail & Related papers (2024-10-22T21:50:52Z) - JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models [110.45794710162241]
Existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs to synthesize massive math problems.
We propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data.
We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data.
arXiv Detail & Related papers (2024-05-23T09:43:19Z) - Feedback-Generation for Programming Exercises With GPT-4 [0.0]
This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input.
The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material.
arXiv Detail & Related papers (2024-03-07T12:37:52Z) - Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator [63.762209407570715]
Genixer is a comprehensive data generation pipeline consisting of four key steps.
A synthetic VQA-like dataset trained with LLaVA1.5 enhances performance on 10 out of 12 multimodal benchmarks.
MLLMs trained with task-specific datasets can surpass GPT-4V in generating complex instruction tuning data.
arXiv Detail & Related papers (2023-12-11T09:44:41Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z) - Exploring and Characterizing Large Language Models For Embedded System
Development and Debugging [10.967443876391611]
Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems has not been studied.
We develop an open source framework to evaluate leading LLMs to assess their capabilities and limitations for embedded system development.
We leverage this finding to study how human programmers interact with these tools, and develop an human-AI based software engineering workflow for building embedded systems.
arXiv Detail & Related papers (2023-07-07T20:14:22Z) - Visual Instruction Tuning [79.70923292053097]
We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data.
By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant.
When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%.
arXiv Detail & Related papers (2023-04-17T17:59:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.