Non-Programmers Can Label Programs Indirectly via Active Examples: A
Case Study with Text-to-SQL
- URL: http://arxiv.org/abs/2205.12422v3
- Date: Mon, 23 Oct 2023 11:12:48 GMT
- Title: Non-Programmers Can Label Programs Indirectly via Active Examples: A
Case Study with Text-to-SQL
- Authors: Ruiqi Zhong, Charlie Snell, Dan Klein, Jason Eisner
- Abstract summary: APEL is a framework in which non-programmers select among candidate programs generated by a seed semantic (e.g., Codex)
For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs.
It asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct.
- Score: 61.950839050383514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can non-programmers annotate natural language utterances with complex
programs that represent their meaning? We introduce APEL, a framework in which
non-programmers select among candidate programs generated by a seed semantic
parser (e.g., Codex). Since they cannot understand the candidate programs, we
ask them to select indirectly by examining the programs' input-ouput examples.
For each utterance, APEL actively searches for a simple input on which the
candidate programs tend to produce different outputs. It then asks the
non-programmers only to choose the appropriate output, thus allowing us to
infer which program is correct and could be used to fine-tune the parser. As a
first case study, we recruited human non-programmers to use APEL to re-annotate
SPIDER, a text-to-SQL dataset. Our approach achieved the same annotation
accuracy as the original expert annotators (75%) and exposed many subtle errors
in the original annotations.
Related papers
- Weakly Supervised Semantic Parsing with Execution-based Spurious Program
Filtering [19.96076749160955]
We propose a domain-agnostic filtering mechanism based on program execution results.
We run a majority vote on these representations to identify and filter out programs with significantly different semantics from the other programs.
arXiv Detail & Related papers (2023-11-02T11:45:40Z) - GPT is becoming a Turing machine: Here are some ways to program it [16.169056235216576]
We show that GPT-3 models can be triggered to execute programs that involve loops.
We show that prompts that may not even cover one full task example can trigger algorithmic behaviour.
arXiv Detail & Related papers (2023-03-25T00:43:41Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z) - UniRPG: Unified Discrete Reasoning over Table and Text as Program
Generation [32.74302320558048]
We propose UniRPG, a semantic-parsing-based approach advanced in interpretability and scalability.
UniRPG performs unified discrete reasoning over heterogeneous knowledge resources, i.e., table and text, as program generation.
It achieves tremendous improvements and enhances interpretability and scalability compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T10:17:52Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - Enforcing Consistency in Weakly Supervised Semantic Parsing [68.2211621631765]
We explore the use of consistency between the output programs for related inputs to reduce the impact of spurious programs.
We find that a more consistent formalism leads to improved model performance even without consistency-based training.
arXiv Detail & Related papers (2021-07-13T03:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.