Related papers: Enhancing Trust in Language Model-Based Code Optimization through RLHF: A Research Design

Related papers

ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research [53.736407871322314]
We introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning.<n>Our approach emulates human cognition, implementing an end-to-end workflow that transforms requirements into mathematical models and executable code.<n>It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers.
arXiv Detail & Related papers (2025-06-02T05:11:21Z)
Towards Effective Code-Integrated Reasoning [89.47213509714578]
We investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter.<n>Tool-augmented reinforcement learning can still suffer from potential instability in the learning dynamics.<n>We develop enhanced training strategies that balance exploration and stability, progressively building tool-use capabilities while improving reasoning performance.
arXiv Detail & Related papers (2025-05-30T11:30:18Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Human-In-The-Loop Software Development Agents: Challenges and Future Directions [14.81934634773595]
At Atlassian, we deployed Human-in-the-Loop Software Development Agents to resolve Jira work items and evaluated the generated code quality using functional correctness testing and GPT-based similarity scoring.<n>This paper highlights two major challenges: the high computational costs of unit testing and the variability in LLM-based evaluations.
arXiv Detail & Related papers (2025-04-25T01:52:59Z)
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models [2.6641834518599308]
We study how AI-assisted programming and large language models (LLM) improve software developers' ability via AI tools like Github Copilot and Amazon CodeWhisperer. We show that our Bayesian optimization framework supports AI alignment in code generation by distributing the feedback collection burden.
arXiv Detail & Related papers (2025-03-19T11:44:47Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms.
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
Improving Retrospective Language Agents via Joint Policy Gradient Optimization [57.35348425288859]
RetroAct is a framework that jointly optimize both task-planning and self-reflective evolution capabilities in language agents. We develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning. We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
arXiv Detail & Related papers (2025-03-03T12:54:54Z)
Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks.<n>This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z)
Optimizing AI-Assisted Code Generation [0.8901073744693314]
AI-assisted code-generation tools have significantly transformed software development.<n>The security, reliability, functionality, and quality of the generated code must be guaranteed.<n>This paper examines the implementation of these goals to date and explores strategies to optimize them.
arXiv Detail & Related papers (2024-12-14T20:14:44Z)
The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap [12.363424584297974]
This paper outlines a roadmap for advancing the next generation of trustworthy AI systems.<n>We show how FMs can help LLMs generate more reliable and formally certified outputs.<n>We acknowledge that this integration has the potential to enhance both the trustworthiness and efficiency of software engineering practices.
arXiv Detail & Related papers (2024-12-09T14:14:21Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment [103.05005690990271]
Traditional alignment strategies rely heavily on human intervention, such asSupervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) We propose a novel self-alignment method that utilizes a Chain of Thought (CoT) approach, termed AlignCoT. We introduce the Mixture of insighTful Experts (MoTE) architecture, which applies mixture of experts to enhance each component of the AlignCoT process, markedly increasing alignment efficiency.
arXiv Detail & Related papers (2024-05-01T15:06:05Z)
Large Language Model-based Human-Agent Collaboration for Complex Task Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving. We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z)
DeAL: Decoding-time Alignment for Large Language Models [59.63643988872571]
Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. We propose DeAL, a framework that allows the user to customize reward functions and enables Detime Alignment of LLMs. Our experiments show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs.
arXiv Detail & Related papers (2024-02-05T06:12:29Z)
Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, their large size makes their inference slow and computationally expensive. We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z)
Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness. We combine the SE concept of code complexity with the AI technique of curriculum learning. We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.