Automated Creation and Human-assisted Curation of Computable Scientific
Models from Code and Text
- URL: http://arxiv.org/abs/2202.13739v1
- Date: Fri, 28 Jan 2022 17:31:38 GMT
- Title: Automated Creation and Human-assisted Curation of Computable Scientific
Models from Code and Text
- Authors: Varish Mulwad, Andrew Crapo, Vijay S. Kumar, James Jobin, Alfredo
Gabaldon, Nurali Virani, Sharad Dixit, Narendra Joshi
- Abstract summary: Domain experts cannot gain a complete understanding of the implementation of a scientific model if they are not familiar with the code.
We develop a system for the automated creation and human-assisted curation of scientific models.
We present experimental results obtained using a dataset of code and associated text derived from NASA's Hypersonic Aerodynamics website.
- Score: 2.3746609573239756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scientific models hold the key to better understanding and predicting the
behavior of complex systems. The most comprehensive manifestation of a
scientific model, including crucial assumptions and parameters that underpin
its usability, is usually embedded in associated source code and documentation,
which may employ a variety of (potentially outdated) programming practices and
languages. Domain experts cannot gain a complete understanding of the
implementation of a scientific model if they are not familiar with the code.
Furthermore, rapid research and development iterations make it challenging to
keep up with constantly evolving scientific model codebases. To address these
challenges, we develop a system for the automated creation and human-assisted
curation of a knowledge graph of computable scientific models that analyzes a
model's code in the context of any associated inline comments and external
documentation. Our system uses knowledge-driven as well as data-driven
approaches to identify and extract relevant concepts from code and equations
from textual documents to semantically annotate models using domain
terminology. These models are converted into executable Python functions and
then can further be composed into complex workflows to answer different forms
of domain-driven questions. We present experimental results obtained using a
dataset of code and associated text derived from NASA's Hypersonic Aerodynamics
website.
Related papers
- The Future of Scientific Publishing: Automated Article Generation [0.0]
This study introduces a novel software tool leveraging large language model (LLM) prompts, designed to automate the generation of academic articles from Python code.
Python served as a foundational proof of concept; however, the underlying methodology and framework exhibit adaptability across various GitHub repo's.
The development was achieved without reliance on advanced language model agents, ensuring high fidelity in the automated generation of coherent and comprehensive academic content.
arXiv Detail & Related papers (2024-04-11T16:47:02Z) - Generative retrieval-augmented ontologic graph and multi-agent
strategies for interpretive large language model-based materials design [0.0]
Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design and manufacturing.
Here we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials.
arXiv Detail & Related papers (2023-10-30T20:31:50Z) - Large Language Models for Scientific Synthesis, Inference and
Explanation [56.41963802804953]
We show how large language models can perform scientific synthesis, inference, and explanation.
We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature.
This approach has the further advantage that the large language model can explain the machine learning system's predictions.
arXiv Detail & Related papers (2023-10-12T02:17:59Z) - Generative Models as a Complex Systems Science: How can we make sense of
large language model behavior? [75.79305790453654]
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP.
We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance.
arXiv Detail & Related papers (2023-07-31T22:58:41Z) - Grounded Decoding: Guiding Text Generation with Grounded Models for
Embodied Agents [111.15288256221764]
Grounded-decoding project aims to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives.
We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
arXiv Detail & Related papers (2023-03-01T22:58:50Z) - Constructing Effective Machine Learning Models for the Sciences: A
Multidisciplinary Perspective [77.53142165205281]
We show how flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models.
We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models.
arXiv Detail & Related papers (2022-11-21T17:48:44Z) - An Overview on Controllable Text Generation via Variational
Auto-Encoders [15.97186478109836]
Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans.
Latent variable models (LVM) such as variational auto-encoders (VAEs) are designed to characterize the distributional pattern of textual data.
This overview gives an introduction to existing generation schemes, problems associated with text variational auto-encoders, and a review of several applications about the controllable generation.
arXiv Detail & Related papers (2022-11-15T07:36:11Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - PHOTONAI -- A Python API for Rapid Machine Learning Model Development [2.414341608751139]
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development.
It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences.
arXiv Detail & Related papers (2020-02-13T10:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.