Large Language Models as Planning Domain Generators
- URL: http://arxiv.org/abs/2405.06650v1
- Date: Tue, 2 Apr 2024 19:39:23 GMT
- Title: Large Language Models as Planning Domain Generators
- Authors: James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi,
- Abstract summary: Large language models (LLMs) can be used to generate planning domain models from simple textual descriptions.
LLMs exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions.
- Score: 21.61328475841776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.
Related papers
- BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset [0.0]
We present PARADISE, an abductive reasoning task using Q&A format on practical procedural text sourced from wikiHow.
It involves warning and tip inference tasks directly associated with goals, excluding intermediary steps, with the aim of testing the ability of the models to infer implicit knowledge of the plan solely from the given goal.
Our experiments, utilizing fine-tuned language models and zero-shot prompting, reveal the effectiveness of task-specific small models over large language models in most scenarios.
arXiv Detail & Related papers (2024-03-05T18:01:59Z) - ChipNeMo: Domain-Adapted LLMs for Chip Design [19.43613652552849]
ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design.
We adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models.
arXiv Detail & Related papers (2023-10-31T22:35:58Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Exploring Distributional Shifts in Large Language Models for Code
Analysis [36.73114441988879]
We study how three large language models with code capabilities generalize to out-of-domain data.
We consider two fundamental applications - code summarization, and code generation.
We find that a model adapted to multiple domains simultaneously performs on par with those adapted to a single domain.
arXiv Detail & Related papers (2023-03-16T07:45:46Z) - Progressive Generation of Long Text with Pretrained Language Models [83.62523163717448]
Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators.
It is still challenging for such models to generate coherent long passages of text, especially when the models are fine-tuned to the target domain on a small corpus.
We propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution.
arXiv Detail & Related papers (2020-06-28T21:23:05Z) - DomBERT: Domain-oriented Language Model for Aspect-based Sentiment
Analysis [71.40586258509394]
We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora.
Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis, demonstrating promising results.
arXiv Detail & Related papers (2020-04-28T21:07:32Z) - Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision.
We propose domain data selection methods based on such models.
We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z) - Few-shot Natural Language Generation for Task-Oriented Dialog [113.07438787659859]
We present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems.
We develop the SC-GPT model, which is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability.
Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods.
arXiv Detail & Related papers (2020-02-27T18:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.