Related papers: NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

URL: http://arxiv.org/abs/2302.07845v3
Date: Sun, 18 Jun 2023 16:27:16 GMT
Title: NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation
Authors: Quchen Fu, Zhongwei Teng, Marco Georgaklis, Jules White, Douglas C. Schmidt
Abstract summary: This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets.
Score: 2.099922236065961
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.

Related papers

LLM-Supported Natural Language to Bash Translation [3.944966059637878]
We present a novel functional equivalence that combines command execution with evaluation of command outputs. We show that parsing, in-context learning, in-weight learning, and constrained decoding can improve NL2SH accuracy by up to 32%.
arXiv Detail & Related papers (2025-02-07T19:35:55Z)
Task Arithmetic for Language Expansion in Speech Translation [41.721843322787045]
We propose to expand new language pairs by merging the model trained on new language pairs and the existing model. We find that the direct application of task arithmetic for ST causes the merged model to fail to follow instructions. To eliminate language confusion, we propose an augmented task arithmetic method that merges an additional language control model.
arXiv Detail & Related papers (2024-09-17T15:25:11Z)
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language. Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z)
Tackling Execution-Based Evaluation for NL2Bash [0.9176056742068815]
Execution-based evaluation (EE) can validate the predicted code by comparing the execution output of model prediction and expected output in system. We create a set of 50 prompts to evaluate some popular Large Language Models for NL2Bash.
arXiv Detail & Related papers (2024-05-10T20:45:34Z)
LAMPAT: Low-Rank Adaption for Multilingual Paraphrasing Using Adversarial Training [19.173992333194683]
Paraphrases are texts that convey the same meaning while using different words or sentence structures. Previous studies have leveraged the knowledge from the machine translation field, forming a paraphrase through zero-shot machine translation in the same language. We propose the first unsupervised multilingual paraphrasing model, LAMPAT, by which monolingual dataset is sufficient enough to generate a human-like and diverse sentence.
arXiv Detail & Related papers (2024-01-09T04:19:16Z)
Sinhala-English Parallel Word Dictionary Dataset [0.554780083433538]
We introduce three parallel English-Sinhala word dictionaries (En-Si-dict-large, En-Si-dict-filtered, En-Si-dict-FastText) which help in multilingual Natural Language Processing (NLP) tasks related to English and Sinhala languages.
arXiv Detail & Related papers (2023-08-04T10:21:35Z)
Explaining Patterns in Data with Language Models via Interpretable Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data. iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions. Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z)
Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data. Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z)
An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models [54.74525882974022]
We show that few-shot examples can strongly boost the probing performance for both 1-hop and 2-hop relations. In particular, we find that a simple-yet-effective approach of finetuning the bias vectors in the model outperforms existing prompt-engineering methods.
arXiv Detail & Related papers (2021-09-06T23:29:36Z)
proScript: Partially Ordered Scripts Generation via Pre-trained Language Models [49.03193243699244]
We demonstrate for the first time that pre-trained neural language models (LMs) can be finetuned to generate high-quality scripts. We collected a large (6.4k), crowdsourced partially ordered scripts (named proScript) Our experiments show that our models perform well (e.g., F1=75.7 in task (i)), illustrating a new approach to overcoming previous barriers to script collection.
arXiv Detail & Related papers (2021-04-16T17:35:10Z)
Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages. We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data. Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.