Related papers: Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL

URL: http://arxiv.org/abs/2205.12422v3
Date: Mon, 23 Oct 2023 11:12:48 GMT
Title: Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Authors: Ruiqi Zhong, Charlie Snell, Dan Klein, Jason Eisner
Abstract summary: APEL is a framework in which non-programmers select among candidate programs generated by a seed semantic (e.g., Codex) For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs. It asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct.
Score: 61.950839050383514
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs, we ask them to select indirectly by examining the programs' input-ouput examples. For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs. It then asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct and could be used to fine-tune the parser. As a first case study, we recruited human non-programmers to use APEL to re-annotate SPIDER, a text-to-SQL dataset. Our approach achieved the same annotation accuracy as the original expert annotators (75%) and exposed many subtle errors in the original annotations.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Weakly Supervised Semantic Parsing with Execution-based Spurious Program Filtering [19.96076749160955]
We propose a domain-agnostic filtering mechanism based on program execution results. We run a majority vote on these representations to identify and filter out programs with significantly different semantics from the other programs.
arXiv Detail & Related papers (2023-11-02T11:45:40Z)
GPT is becoming a Turing machine: Here are some ways to program it [16.169056235216576]
We show that GPT-3 models can be triggered to execute programs that involve loops. We show that prompts that may not even cover one full task example can trigger algorithmic behaviour.
arXiv Detail & Related papers (2023-03-25T00:43:41Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies. By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z)
UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation [32.74302320558048]
We propose UniRPG, a semantic-parsing-based approach advanced in interpretability and scalability. UniRPG performs unified discrete reasoning over heterogeneous knowledge resources, i.e., table and text, as program generation. It achieves tremendous improvements and enhances interpretability and scalability compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T10:17:52Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection. We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z)
Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs. We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.