Related papers: PIPer: On-Device Environment Setup via Online Reinforcement Learning

PIPer: On-Device Environment Setup via Online Reinforcement Learning

URL: http://arxiv.org/abs/2509.25455v1
Date: Mon, 29 Sep 2025 20:03:05 GMT
Title: PIPer: On-Device Environment Setup via Online Reinforcement Learning
Authors: Alexander Kovrigin, Aleksandra Eliseeva, Konstantin Grotov, Egor Bogomolov, Yaroslav Zharov,
Abstract summary: Automated environment setup methods could assist developers by providing fully configured environments for arbitrary repositories without manual effort.<n>Recent studies reveal that even state-of-the-art Large Language Models (LLMs) achieve limited success in automating this task.<n>We combine supervised fine-tuning for generating correct scripts and Reinforcement Learning with Verifiable Rewards (RLVR) to adapt it to the task of environment setup.<n>On EnvBench-Python, our method enables Qwen3-8B (a model runnable on consumer hardware) to perform on par with larger models-Qwen3-32B and GPT-4
Score: 74.52354321028493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Environment setup-the process of configuring the system to work with a specific software project-represents a persistent challenge in Software Engineering (SE). Automated environment setup methods could assist developers by providing fully configured environments for arbitrary repositories without manual effort. This also helps SE researchers to scale execution-based benchmarks. However, recent studies reveal that even state-of-the-art Large Language Models (LLMs) achieve limited success in automating this task. To address this limitation, we tune a specialized model for environment setup. We combine supervised fine-tuning for generating correct Bash scripts and Reinforcement Learning with Verifiable Rewards (RLVR) to adapt it to the task of environment setup. On EnvBench-Python, our method enables Qwen3-8B (a model runnable on consumer hardware) to perform on par with larger models-Qwen3-32B and GPT-4o. The training code and model checkpoints are available online: https://github.com/JetBrains-Research/PIPer.

Related papers

SWE-World: Building Software Engineering Agents in Docker-Free Environments [91.17484806743641]
SWE-World is a Docker-free framework that replaces physical execution environments with a learned surrogate for training and evaluating software engineering agents.<n>We show that SWE-World raises Qwen2.5-Coder-32B from 6.2% to 52.0% via Docker-free SFT, 55.0% with Docker-free RL, and 68.2% with further TTS.
arXiv Detail & Related papers (2026-02-03T11:44:39Z)
Environment-Aware Code Generation: How far are We? [52.69113158357018]
It is unclear whether large language models (LLMs) can reliably generate executable code tailored to a user's specific environment.<n>We present the first systematic study of Environment-Aware Code Generation (EACG), where generated code must be functionally correct and directly executable under arbitrary software configurations.<n>Our results show that current LLMs struggle with environment-specific code generation, while our adaptations improve environment compatibility and executability.
arXiv Detail & Related papers (2026-01-18T04:58:15Z)
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis [101.67583081810136]
Large language models (LLMs) are expected to be trained to act as agents in various real-world environments.<n>This process relies on rich and varied tool-interaction sandboxes.<n>We propose EnvScaler, an automated framework for scalable tool-interaction environments.
arXiv Detail & Related papers (2026-01-09T14:32:06Z)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.<n>Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.<n>To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z)
PyPackIT: Automated Research Software Engineering for Scientific Python Applications on GitHub [0.0]
PyPackIT is a user-friendly, ready-to-use software that enables scientists to focus on the scientific aspects of their projects.<n> PyPackIT offers a robust project infrastructure including a build-ready Python package skeleton, a fully operational documentation and test suite, and a control center for dynamic project management.
arXiv Detail & Related papers (2025-03-06T19:41:55Z)
Repo2Run: Automated Building Executable Environment for Code Repository at Scale [8.795746370609855]
We introduce Repo2Run, an agent aiming at automating the building of executable test environments for any repositories at scale.<n>Repo2Run iteratively builds the Docker image, runs unit tests based on the feedback of the building, and synthesizes the Dockerfile.<n>The resulting Dockerfile can then be used to create Docker container environments for running code and tests.
arXiv Detail & Related papers (2025-02-19T12:51:35Z)
WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible. We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains. We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z)
Learning Task Automata for Reinforcement Learning using Hidden Markov Models [37.69303106863453]
This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state task automata' We learn a product MDP, a model composed of the specification's automaton and the environment's MDP, by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy.
arXiv Detail & Related papers (2022-08-25T02:58:23Z)
NLPGym -- A toolkit for evaluating RL agents on Natural Language Processing Tasks [2.5760935151452067]
We release NLPGym, an open-source Python toolkit that provides interactive textual environments for standard NLP tasks. We present experimental results for 6 tasks using different RL algorithms which serve as baselines for further research.
arXiv Detail & Related papers (2020-11-16T20:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.