Semantic Scaffolds for Pseudocode-to-Code Generation
- URL: http://arxiv.org/abs/2005.05927v1
- Date: Tue, 12 May 2020 17:10:13 GMT
- Title: Semantic Scaffolds for Pseudocode-to-Code Generation
- Authors: Ruiqi Zhong, Mitchell Stern, Dan Klein
- Abstract summary: We propose a method for program generation based on semantic scaffolds, lightweight structures representing the high-level semantic and syntactic composition of a program.
By using semantic scaffolds during inference, we achieve a 10% absolute improvement in top-100 accuracy over the previous state-of-the-art.
- Score: 47.09844589656143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for program generation based on semantic scaffolds,
lightweight structures representing the high-level semantic and syntactic
composition of a program. By first searching over plausible scaffolds then
using these as constraints for a beam search over programs, we achieve better
coverage of the search space when compared with existing techniques. We apply
our hierarchical search method to the SPoC dataset for pseudocode-to-code
generation, in which we are given line-level natural language pseudocode
annotations and aim to produce a program satisfying execution-based test cases.
By using semantic scaffolds during inference, we achieve a 10% absolute
improvement in top-100 accuracy over the previous state-of-the-art.
Additionally, we require only 11 candidates to reach the top-3000 performance
of the previous best approach when tested against unseen problems,
demonstrating a substantial improvement in efficiency.
Related papers
- Let's Revise Step-by-Step: A Unified Local Search Framework for Code Generation with LLMs [16.818072348542923]
We propose a unified local search framework which effectively performs step-by-step code revision.<n>Specifically, ReLoc explores a series of local revisions through four key algorithmic components.<n>We develop a specialized revision reward model that evaluates code quality based on revision distance to produce fine-grained preferences.
arXiv Detail & Related papers (2025-08-10T17:11:56Z) - Seed-CTS: Unleashing the Power of Tree Search for Superior Performance in Competitive Coding Tasks [16.853404804069527]
We propose a novel token-level tree search method specifically designed for code generation.
Our approach achieves a pass rate of 0.305 on LiveCodeBench-Hard, surpassing the pass@100 performance of GPT4o-0513 (0.245)
Our findings underscore the potential of tree search to significantly enhance performance on competition-level code generation tasks.
arXiv Detail & Related papers (2024-12-17T05:10:21Z) - Prompt-based Code Completion via Multi-Retrieval Augmented Generation [15.233727939816388]
ProCC is a code completion framework leveraging prompt engineering and the contextual multi-armed bandits algorithm.
ProCC outperforms state-of-the-art code completion technique by 8.6% on our collected open-source benchmark suite.
ProCC also allows augmenting fine-tuned techniques in a plug-and-play manner, yielding 5.6% improvement over our studied fine-tuned model.
arXiv Detail & Related papers (2024-05-13T07:56:15Z) - Top-Down Synthesis for Library Learning [46.285220926554345]
corpus-guided top-down synthesis is a mechanism for synthesizing library functions that capture common functionality from a corpus of programs.
We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder.
arXiv Detail & Related papers (2022-11-29T21:57:42Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - On the Importance of Building High-quality Training Datasets for Neural
Code Search [15.557818317497397]
We propose a data cleaning framework consisting of two subsequent filters: a rule-based syntactic filter and a model-based semantic filter.
We evaluate the effectiveness of our framework on two widely-used code search models and three manually-annotated code retrieval benchmarks.
arXiv Detail & Related papers (2022-02-14T12:02:41Z) - CoSQA: 20,000+ Web Queries for Code Search and Question Answering [63.92224685262063]
CoSQA dataset includes 20,604 labels for pairs of natural language queries and codes.
We introduce a contrastive learning method dubbed CoCLR to enhance query-code matching.
We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%.
arXiv Detail & Related papers (2021-05-27T15:37:21Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.