Related papers: Using (Not so) Large Language Models for Generating Simulation Models in a Formal DSL -- A Study on Reaction Networks

Using (Not so) Large Language Models for Generating Simulation Models in a Formal DSL -- A Study on Reaction Networks

URL: http://arxiv.org/abs/2503.01675v1
Date: Mon, 03 Mar 2025 15:48:01 GMT
Title: Using (Not so) Large Language Models for Generating Simulation Models in a Formal DSL -- A Study on Reaction Networks
Authors: Justin N. Kreikemeyer, Miłosz Jankowski, Pia Wilsdorf, Adelinde M. Uhrmacher,
Abstract summary: We evaluate how a Large Language Model might be used for formalizing natural language into simulation models.<n>We develop a synthetic data generator to serve as the basis for fine-tuning and evaluation.<n>Our evaluation shows that our fine-tuned Mistral model can recover the ground truth simulation model in up to 84.5% of cases.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Formal languages are an integral part of modeling and simulation. They allow the distillation of knowledge into concise simulation models amenable to automatic execution, interpretation, and analysis. However, the arguably most humanly accessible means of expressing models is through natural language, which is not easily interpretable by computers. Here, we evaluate how a Large Language Model (LLM) might be used for formalizing natural language into simulation models. Existing studies only explored using very large LLMs, like the commercial GPT models, without fine-tuning model weights. To close this gap, we show how an open-weights, 7B-parameter Mistral model can be fine-tuned to translate natural language descriptions to reaction network models in a domain-specific language, offering a self-hostable, compute-, and memory efficient alternative. To this end, we develop a synthetic data generator to serve as the basis for fine-tuning and evaluation. Our quantitative evaluation shows that our fine-tuned Mistral model can recover the ground truth simulation model in up to 84.5% of cases. In addition, our small-scale user study demonstrates the model's practical potential for one-time generation as well as interactive modeling in various domains. While promising, in its current form, the fine-tuned small LLM cannot catch up with large LLMs. We conclude that higher-quality training data are required, and expect future small and open-source LLMs to offer new opportunities.

Related papers

Resona: Improving Context Copying in Linear Recurrence Models with Retrieval [24.84741364872597]
We introduce __Resona__, a simple and scalable framework for augmenting linear recurrent models with retrieval. Experiments on a variety of linear recurrent models demonstrate significant performance gains on a variety of synthetic as well as real-world natural language tasks.
arXiv Detail & Related papers (2025-03-28T23:43:33Z)
LLM-enabled Instance Model Generation [4.52634430160579]
This work explores the generation of instance models using large language models (LLMs) We propose a two-step approach: first, using LLMs to produce a simplified structured output containing all necessary instance model information, and then compiling this intermediate representation into a valid XMI file. Results show that the proposed method significantly improves the usability of LLMs for instance model generation tasks.
arXiv Detail & Related papers (2025-03-28T16:34:29Z)
Scalable Language Models with Posterior Inference of Latent Thought Vectors [52.63299874322121]
Latent-Thought Language Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space.<n>LTMs possess additional scaling dimensions beyond traditional LLMs, yielding a structured design space.<n>LTMs significantly outperform conventional autoregressive models and discrete diffusion models in validation perplexity and zero-shot language modeling.
arXiv Detail & Related papers (2025-02-03T17:50:34Z)
Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling. We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z)
Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT) We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z)
Automated Statistical Model Discovery with Language Models [34.03743547761152]
We introduce a method for language model driven automated statistical model discovery. We cast our automated procedure within the principled framework of Box's Loop. Our results highlight the promise of LM-driven model discovery.
arXiv Detail & Related papers (2024-02-27T20:33:22Z)
Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models [10.677971531050611]
We introduce a rank-based metric, Diff-eRank, grounded in information theory and geometry principles. For language models, our results show that Diff-eRank increases with model size and correlates well with conventional metrics such as loss and accuracy. In the multi-modal context, we propose an alignment evaluation method based on the eRank, and verify that contemporary multi-modal LLMs exhibit strong alignment performance based on our method.
arXiv Detail & Related papers (2024-01-30T16:19:55Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
Reimagining Retrieval Augmented Language Models for Answering Queries [23.373952699385427]
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions.
arXiv Detail & Related papers (2023-06-01T18:08:51Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Evaluation of HTR models without Ground Truth Material [2.4792948967354236]
evaluation of Handwritten Text Recognition models during their development is straightforward. But the evaluation process becomes tricky as soon as we switch from development to application. We show that lexicon-based evaluation can compete with lexicon-based methods.
arXiv Detail & Related papers (2022-01-17T01:26:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.