Related papers: FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system

FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system

URL: http://arxiv.org/abs/2410.21349v3
Date: Thu, 02 Jan 2025 11:16:32 GMT
Title: FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system
Authors: Zeyuan Li, Yangfan He, Lewei He, Jianhui Wang, Tianyu Shi, Bin Lei, Yuchen Li, Qiuwu Chen,
Abstract summary: Large language models (LLMs) have achieved significant progress in automated code generation.<n> challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code.<n>We propose Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization (i.e., FALCON)
Score: 8.775210512734603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in coding scenarios. In particular, they were hampered by datasets that lacked diversity and failed to address specialized tasks or edge cases. Furthermore, challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code. To tackle these challenges and improve the code generation performance for automated programming systems, we propose Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization (i.e., FALCON). FALCON is structured into two hierarchical levels. From the global level, long-term memory improves code quality by retaining and applying learned knowledge. At the local level, short-term memory allows for the incorporation of immediate feedback from compilers and AI systems. Additionally, we introduce meta-reinforcement learning with feedback rewards to solve the global-local bi-level optimization problem and enhance the model's adaptability across diverse code generation tasks. Extensive experiments demonstrate that our technique achieves state-of-the-art performance, leading other reinforcement learning methods by more than 4.5 percentage points on the MBPP benchmark and 6.1 percentage points on the Humaneval benchmark. The open-sourced code is publicly available at https://github.com/titurte/FALCON.

Related papers

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models [6.380729797938521]
retrieval-augmented generation (RAG) has become the dominant way to introduce new information. Recent RAG approaches augment vector embeddings with various structures like knowledge graphs to address some gaps, namely sense-making and associativity. We propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks.
arXiv Detail & Related papers (2025-02-20T18:26:02Z)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process. Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z)
ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs [44.80420740455364]
M2WF is a framework for improving large language models' one-time code generation. Unlike prior methods, it minimizes dependency on curated data and adapts to various coding scenarios. The code and framework will be publicly available on GitHub and HuggingFace.
arXiv Detail & Related papers (2025-01-14T07:16:43Z)
On the Convergence of Continual Federated Learning Using Incrementally Aggregated Gradients [2.2530496464901106]
The holy grail of machine learning is to enable Continual Federated Learning (CFL) to enhance the efficiency, privacy, and scalability of AI systems while learning from streaming data. We propose a novel replay-memory based federated strategy consisting of edge-based gradient updates on memory and aggregated gradients on the current data. We empirically show that C-FLAG outperforms several state-of-the-art baselines on both task and class-incremental settings with respect to metrics such as accuracy and forgetting.
arXiv Detail & Related papers (2024-11-12T17:36:20Z)
Process Supervision-Guided Policy Optimization for Code Generation [15.943210767010045]
Reinforcement learning (RL) with unit test feedback has enhanced large language models' (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation. We propose a Process Reward Model (PRM) that delivers dense, line-level feedback on code correctness during generation, mimicking human code refinement.
arXiv Detail & Related papers (2024-10-23T07:22:33Z)
Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code [20.60634057560564]
We propose a framework for EXtending Pre-trained language models for lOng-range code. EXPO incorporates two innovative memory mechanisms: Bridge Memory and Hint Memory. We validate the effectiveness of EXPO on five popular pre-trained language models such as UniXcoder.
arXiv Detail & Related papers (2024-05-18T09:06:41Z)
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts [31.783388267874738]
We propose NaturalCodeBench (NCB), a challenging code benchmark designed to mirror the complexity and variety of scenarios in real coding tasks. NCB comprises 402 high-quality problems in Python and Java, meticulously selected from natural user queries from online coding services. Our systematic experiments on 39 LLMs find that performance gaps on NCB between models with close HumanEval scores could still be significant.
arXiv Detail & Related papers (2024-05-07T17:52:51Z)
DeAL: Decoding-time Alignment for Large Language Models [59.63643988872571]
Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. We propose DeAL, a framework that allows the user to customize reward functions and enables Detime Alignment of LLMs. Our experiments show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs.
arXiv Detail & Related papers (2024-02-05T06:12:29Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
PerfRL: A Small Language Model Framework for Efficient Code Optimization [14.18092813639534]
In this paper, we introduce PerfRL, an innovative framework designed to tackle the problem of code optimization. Our framework leverages the capabilities of small language models (SLMs) and reinforcement learning (RL) Our approach achieves similar or better results compared to state-of-the-art models using shorter training times and smaller pre-trained models.
arXiv Detail & Related papers (2023-12-09T19:50:23Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
Exploring Continual Learning for Code Generation Models [80.78036093054855]
Continual Learning (CL) is an important aspect that remains underexplored in the code domain. We introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement. We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism.
arXiv Detail & Related papers (2023-07-05T16:58:39Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs. For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z)
Rich Feature Construction for the Optimization-Generalization Dilemma [18.721567020497968]
We construct a rich representation (RFC) containing a palette of potentially useful features, ready to be used by models. RFC consistently helps six OoD methods achieve top performance on challenging invariant training benchmarks. On the realistic Camelyon17 task, our method helps both OoD and methods outperform earlier compatable results by at least $5%$.
arXiv Detail & Related papers (2022-03-24T20:39:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.