Related papers: Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey

Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey

URL: http://arxiv.org/abs/2412.20367v2
Date: Thu, 02 Jan 2025 09:43:43 GMT
Title: Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Authors: Junqiao Wang, Zeng Zhang, Yangfan He, Yuyang Song, Tianyu Shi, Yuchen Li, Hengyuan Xu, Kunyu Wu, Guangwu Qian, Qiuwu Chen, Lewei He,
Abstract summary: reinforcement learning (RL) has emerged as a pivotal technique for code generation and optimization.<n>This paper presents a systematic survey of the application of RL in code optimization and generation.
Score: 7.7582469015328295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid evolution of large language models (LLM), reinforcement learning (RL) has emerged as a pivotal technique for code generation and optimization in various domains. This paper presents a systematic survey of the application of RL in code optimization and generation, highlighting its role in enhancing compiler optimization, resource allocation, and the development of frameworks and tools. Subsequent sections first delve into the intricate processes of compiler optimization, where RL algorithms are leveraged to improve efficiency and resource utilization. The discussion then progresses to the function of RL in resource allocation, emphasizing register allocation and system optimization. We also explore the burgeoning role of frameworks and tools in code generation, examining how RL can be integrated to bolster their capabilities. This survey aims to serve as a comprehensive resource for researchers and practitioners interested in harnessing the power of RL to advance code generation and optimization techniques.

Related papers

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks [110.20297293596005]
Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks. Existing multi-turn RL algorithms for optimizing LLM agents fail to perform effective credit assignment over multiple turns while leveraging the generalization capabilities of LLMs. We propose a novel RL algorithm, SWEET-RL, that uses a carefully designed optimization objective to train a critic model with access to additional training-time information. Our experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success and win rates on ColBench compared to other state-of-the-art multi-turn RL algorithms.
arXiv Detail & Related papers (2025-03-19T17:55:08Z)
Reinforcement Learning Enhanced LLMs: A Survey [45.57586245741664]
We will make a systematic review of the most up-to-date state of knowledge on RL-enhanced large language models (LLMs) Specifically, we detail the basics of RL; (2) introduce popular RL-enhanced LLMs; (3) review researches on two widely-used reward model-based RL techniques: Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF)
arXiv Detail & Related papers (2024-12-05T16:10:42Z)
A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler [0.10923877073891444]
We introduce the first RL environment for the MLIR compiler, dedicated to facilitating MLIR compiler research. We also propose a novel formulation of the action space as a product of simpler action subspaces, enabling more efficient and effective optimizations.
arXiv Detail & Related papers (2024-09-17T10:49:45Z)
Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases [60.30995339585003]
Deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. DRL faces certain limitations, including low sample efficiency and poor generalization. We present how to leverage generative AI (GAI) to address these issues and enhance the performance of DRL algorithms.
arXiv Detail & Related papers (2024-05-31T01:25:40Z)
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods [18.771658054884693]
Large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. We propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator.
arXiv Detail & Related papers (2024-03-30T08:28:08Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach [11.11570399751075]
This research presents a novel framework that harnesses the potential of Deep Reinforcement Learning (DRL) The integration of our DRL agent with the RAY platform facilitates the creation of RLlib-IMPALA, a novel framework that efficiently uses RAY's resources to improve system adaptability and control.
arXiv Detail & Related papers (2024-02-24T23:25:35Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years. We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature. In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z)
Karolos: An Open-Source Reinforcement Learning Framework for Robot-Task Environments [0.3867363075280544]
In reinforcement learning (RL) research, simulations enable benchmarks between algorithms. In this paper, we introduce Karolos, a framework developed for robotic applications. The code is open source and published on GitHub with the aim of promoting research of RL applications in robotics.
arXiv Detail & Related papers (2022-12-01T23:14:02Z)
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters [0.0]
This paper introduces a new framework for benchmarking the performance of an RL agent in network environments simulated with ns-3. Within this framework, we demonstrate that an RL agent without domain-specific knowledge can learn how to efficiently adjust Radio Access Network (RAN) parameters to match offline optimization in static scenarios.
arXiv Detail & Related papers (2022-09-08T12:58:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.