ReGAL: Refactoring Programs to Discover Generalizable Abstractions
- URL: http://arxiv.org/abs/2401.16467v2
- Date: Thu, 6 Jun 2024 17:31:07 GMT
- Title: ReGAL: Refactoring Programs to Discover Generalizable Abstractions
- Authors: Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal,
- Abstract summary: Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization.
We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains.
For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
- Score: 59.05769810380928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On five datasets -- LOGO graphics generation, Date reasoning, TextCraft (a Minecraft-based text-game) MATH, and TabMWP -- both open-source and proprietary LLMs improve in accuracy when predicting programs with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals ReGAL's abstractions encapsulate frequently-used subroutines as well as environment dynamics.
Related papers
- Investigating the Transferability of Code Repair for Low-Resource Programming Languages [57.62712191540067]
Large language models (LLMs) have shown remarkable performance on code generation tasks.
Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation.
We investigate the benefits of distilling code repair for both high and low resource languages.
arXiv Detail & Related papers (2024-06-21T05:05:39Z) - Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Refactoring Programs Using Large Language Models with Few-Shot Examples [20.48175387745551]
We demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program.
We show that 95.68% of programs can beed by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity.
arXiv Detail & Related papers (2023-11-20T11:43:45Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Improving Unsupervised Visual Program Inference with Code Rewriting
Families [21.515789221802493]
We show how code rewriting can be used to improve systems for inferring programs from visual data.
We propose Sparse Intermittent Rewrite Injection (SIRI), a framework for unsupervised bootstrapped learning.
We design a family of rewriters for visual programming domains: parameter optimization, code pruning, and code grafting.
arXiv Detail & Related papers (2023-09-26T14:44:48Z) - Learning logic programs by discovering higher-order abstractions [20.57989636488575]
We introduce the higher-order optimisation problem.
The goal is to compress a logic program by discovering higher-order abstractions.
We implement our approach in Stevie, which formulates the problem as a constraint problem.
arXiv Detail & Related papers (2023-08-16T12:50:10Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis.
When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.