From Words to Code: Harnessing Data for Program Synthesis from Natural
Language
- URL: http://arxiv.org/abs/2305.01598v2
- Date: Wed, 3 May 2023 07:02:57 GMT
- Title: From Words to Code: Harnessing Data for Program Synthesis from Natural
Language
- Authors: Anirudh Khatry, Joyce Cahoon, Jordan Henkel, Shaleen Deep, Venkatesh
Emani, Avrilia Floratou, Sumit Gulwani, Vu Le, Mohammad Raza, Sherry Shi,
Mukul Singh, Ashish Tiwari
- Abstract summary: We introduce semantic reranking, a technique to rerank the programs generated by large language models (LLMs)
We also introduce temperature mixing, where we combine samples generated by LLMs using both high and low temperatures.
We observe substantial gains across domains, with improvements of up to 45% in top-1 accuracy and 34% in top-3 accuracy.
- Score: 12.665932954069476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating programs to correctly manipulate data is a difficult task, as the
underlying programming languages and APIs can be challenging to learn for many
users who are not skilled programmers. Large language models (LLMs) demonstrate
remarkable potential for generating code from natural language, but in the data
manipulation domain, apart from the natural language (NL) description of the
intended task, we also have the dataset on which the task is to be performed,
or the "data context". Existing approaches have utilized data context in a
limited way by simply adding relevant information from the input data into the
prompts sent to the LLM.
In this work, we utilize the available input data to execute the candidate
programs generated by the LLMs and gather their outputs. We introduce semantic
reranking, a technique to rerank the programs generated by LLMs based on three
signals coming the program outputs: (a) semantic filtering and well-formedness
based score tuning: do programs even generate well-formed outputs, (b) semantic
interleaving: how do the outputs from different candidates compare to each
other, and (c) output-based score tuning: how do the outputs compare to outputs
predicted for the same task. We provide theoretical justification for semantic
interleaving. We also introduce temperature mixing, where we combine samples
generated by LLMs using both high and low temperatures. We extensively evaluate
our approach in three domains, namely databases (SQL), data science (Pandas)
and business intelligence (Excel's Power Query M) on a variety of new and
existing benchmarks. We observe substantial gains across domains, with
improvements of up to 45% in top-1 accuracy and 34% in top-3 accuracy.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities.
We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution.
We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z) - Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [91.52444946362547]
We introduce a novel data augmentation method that generates comments for existing code, coupled with a data filtering strategy that filters out code data poorly correlated with natural language.
We conducted experiments on three code-focused Large Language Models and observed consistent improvements in performance on two widely-used programming skill benchmarks.
arXiv Detail & Related papers (2024-02-20T13:56:38Z) - Grounding Data Science Code Generation with Input-Output Specifications [32.07033683677839]
Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language prompts.
LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O specification.
We propose GIFT4Code, a novel approach for the instruction fine-tuning of LLMs with respect to I/O specifications.
arXiv Detail & Related papers (2024-02-12T21:32:49Z) - Reranking for Natural Language Generation from Logical Forms: A Study
based on Large Language Models [47.08364281023261]
Large language models (LLMs) have demonstrated impressive capabilities in natural language generation.
However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs)
arXiv Detail & Related papers (2023-09-21T17:54:58Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Mixture of Soft Prompts for Controllable Data Generation [21.84489422361048]
Mixture of Soft Prompts (MSP) is proposed as a tool for data augmentation rather than direct prediction.
Our method achieves state-of-the-art results on three benchmarks when compared against strong baselines.
arXiv Detail & Related papers (2023-03-02T21:13:56Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.