Related papers: Automated Creation and Human-assisted Curation of Computable Scientific Models from Code and Text

Automated Creation and Human-assisted Curation of Computable Scientific Models from Code and Text

URL: http://arxiv.org/abs/2202.13739v1
Date: Fri, 28 Jan 2022 17:31:38 GMT
Title: Automated Creation and Human-assisted Curation of Computable Scientific Models from Code and Text
Authors: Varish Mulwad, Andrew Crapo, Vijay S. Kumar, James Jobin, Alfredo Gabaldon, Nurali Virani, Sharad Dixit, Narendra Joshi
Abstract summary: Domain experts cannot gain a complete understanding of the implementation of a scientific model if they are not familiar with the code. We develop a system for the automated creation and human-assisted curation of scientific models. We present experimental results obtained using a dataset of code and associated text derived from NASA's Hypersonic Aerodynamics website.
Score: 2.3746609573239756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific models hold the key to better understanding and predicting the behavior of complex systems. The most comprehensive manifestation of a scientific model, including crucial assumptions and parameters that underpin its usability, is usually embedded in associated source code and documentation, which may employ a variety of (potentially outdated) programming practices and languages. Domain experts cannot gain a complete understanding of the implementation of a scientific model if they are not familiar with the code. Furthermore, rapid research and development iterations make it challenging to keep up with constantly evolving scientific model codebases. To address these challenges, we develop a system for the automated creation and human-assisted curation of a knowledge graph of computable scientific models that analyzes a model's code in the context of any associated inline comments and external documentation. Our system uses knowledge-driven as well as data-driven approaches to identify and extract relevant concepts from code and equations from textual documents to semantically annotate models using domain terminology. These models are converted into executable Python functions and then can further be composed into complex workflows to answer different forms of domain-driven questions. We present experimental results obtained using a dataset of code and associated text derived from NASA's Hypersonic Aerodynamics website.

Related papers

Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code [0.0]
Large language models (LLMs) have demonstrated remarkable program comprehension capabilities. transformer-based topic modeling techniques offer effective ways to extract semantic information from text. This paper proposes and explores a novel approach that combines these strengths to automatically identify meaningful topics in a corpus of Python programs.
arXiv Detail & Related papers (2025-04-24T10:30:40Z)
The AI Cosmologist I: An Agentic System for Automated Data Analysis [0.0]
The AI Cosmologist implements a complete pipeline from idea generation to experimental evaluation and research dissemination. Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies. Results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery.
arXiv Detail & Related papers (2025-04-04T13:12:08Z)
Grammar-based Ordinary Differential Equation Discovery [1.5020330976600738]
We propose a novel framework for the end-to-end discovery of ordinary differential equations (ODEs) The proposed methodology combines formal formality reduction and search for efficiently navigating high-dimensional spaces. Gode proves to be more sample- and parameter-efficient than state-of-the-art transformer-based models.
arXiv Detail & Related papers (2025-04-03T14:28:13Z)
The Future of Scientific Publishing: Automated Article Generation [0.0]
This study introduces a novel software tool leveraging large language model (LLM) prompts, designed to automate the generation of academic articles from Python code. Python served as a foundational proof of concept; however, the underlying methodology and framework exhibit adaptability across various GitHub repo's. The development was achieved without reliance on advanced language model agents, ensuring high fidelity in the automated generation of coherent and comprehensive academic content.
arXiv Detail & Related papers (2024-04-11T16:47:02Z)
Generative retrieval-augmented ontologic graph and multi-agent strategies for interpretive large language model-based materials design [0.0]
Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design and manufacturing. Here we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials.
arXiv Detail & Related papers (2023-10-30T20:31:50Z)
Large Language Models for Scientific Synthesis, Inference and Explanation [56.41963802804953]
We show how large language models can perform scientific synthesis, inference, and explanation. We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature. This approach has the further advantage that the large language model can explain the machine learning system's predictions.
arXiv Detail & Related papers (2023-10-12T02:17:59Z)
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? [75.79305790453654]
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance.
arXiv Detail & Related papers (2023-07-31T22:58:41Z)
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents [111.15288256221764]
Grounded-decoding project aims to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
arXiv Detail & Related papers (2023-03-01T22:58:50Z)
Constructing Effective Machine Learning Models for the Sciences: A Multidisciplinary Perspective [77.53142165205281]
We show how flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models. We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models.
arXiv Detail & Related papers (2022-11-21T17:48:44Z)
An Overview on Controllable Text Generation via Variational Auto-Encoders [15.97186478109836]
Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans. Latent variable models (LVM) such as variational auto-encoders (VAEs) are designed to characterize the distributional pattern of textual data. This overview gives an introduction to existing generation schemes, problems associated with text variational auto-encoders, and a review of several applications about the controllable generation.
arXiv Detail & Related papers (2022-11-15T07:36:11Z)
Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models. The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z)
Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
PHOTONAI -- A Python API for Rapid Machine Learning Model Development [2.414341608751139]
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences.
arXiv Detail & Related papers (2020-02-13T10:33:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.