Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines
- URL: http://arxiv.org/abs/2601.11647v1
- Date: Thu, 15 Jan 2026 05:19:46 GMT
- Title: Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines
- Authors: Aniket Abhishek Soni, Milan Parikh, Rashi Nimesh Kumar Dhenia, Jubin Abhishek Soni, Ayush Raj Jha, Sneja Mitinbhai Shah,
- Abstract summary: This paper proposes a reinforcement learning (RL) based approach to dynamically optimize CI/CD pipeline.<n>The pipeline is modeled as a Markov Decision Process, and an RL agent is trained to make runtime decisions such as selecting full, partial, or no test execution.<n> Experimental results show that the RL optimized pipeline achieves up to a 30 percent improvement in throughput and approximately a 25 percent reduction in test execution time.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous Integration and Continuous Deployment (CI/CD) pipelines are central to modern software delivery, yet their static workflows often introduce inefficiencies as systems scale. This paper proposes a reinforcement learning (RL) based approach to dynamically optimize CI/CD pipeline workflows. The pipeline is modeled as a Markov Decision Process, and an RL agent is trained to make runtime decisions such as selecting full, partial, or no test execution in order to maximize throughput while minimizing testing overhead. A configurable CI/CD simulation environment is developed to evaluate the approach across build, test, and deploy stages. Experimental results show that the RL optimized pipeline achieves up to a 30 percent improvement in throughput and approximately a 25 percent reduction in test execution time compared to static baselines, while maintaining a defect miss rate below 5 percent. The agent learns to selectively skip or abbreviate tests for low risk commits, accelerating feedback cycles without significantly increasing failure risk. These results demonstrate the potential of reinforcement learning to enable adaptive and intelligent DevOps workflows, providing a practical pathway toward more efficient, resilient, and sustainable CI/CD automation.
Related papers
- Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving [22.3805998088591]
DACER-F is a flow matching algorithm for generative policies in autonomous driving systems.<n>It achieves a score of 775.8 on the humanoid-stand task and surpassing prior methods.
arXiv Detail & Related papers (2026-03-03T05:35:53Z) - AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering [52.67783579040657]
AceGRPO is a machine learning system that prioritizes tasks at the agent's learning frontier to maximize learning efficiency.<n>Our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of proprietary frontier models, and outperforms larger open-source baselines.
arXiv Detail & Related papers (2026-02-08T10:55:03Z) - SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism [2.4349098308669594]
This paper introduces a new training methodology, Supervised Contrastive Parallel Learning ( SCPL), that addresses this issue by decoupling BP and transforming a long gradient flow into multiple short ones.<n> Detailed experiments are presented to demonstrate the efficiency and effectiveness of our model compared to BP, Early Exit, GPipe, and Associated Learning (AL), a state-of-the-art method for decoupling backpropagation.<n> SCPL provides a practical pathway for organizations to develop and deploy advanced information systems more cost-effectively and with greater agility.
arXiv Detail & Related papers (2026-01-20T09:19:30Z) - Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward [54.708851958671794]
We propose a Data-Efficient Policy Optimization pipeline that combines optimized strategies for both offline and online data selection.<n>In offline phase, we curate a high-quality subset of training samples based on diversity, influence, and appropriate difficulty.<n>During online RLVR training, we introduce a sample-level explorability metric to dynamically filter samples with low exploration potential.
arXiv Detail & Related papers (2025-09-01T10:04:20Z) - KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z) - Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.<n>Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.<n>We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z) - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z) - AutoLoop: Fast Visual SLAM Fine-tuning through Agentic Curriculum Learning [1.282543877006303]
We present AutoLoop, a novel approach that combines automated curriculum learning with efficient fine-tuning for visual SLAM systems.<n>Our method employs a DDPG (Deep Deterministic Policy Gradient) agent to dynamically adjust loop closure weights during training.<n> Experiments conducted on TartanAir for training and validated across multiple benchmarks including KITTI, EuRoC, ICL-NUIM and TUM RGB-D demonstrate that AutoLoop achieves comparable or superior performance.
arXiv Detail & Related papers (2025-01-15T21:22:09Z) - SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning [51.10866035483686]
High update-to-data (UTD) ratio algorithms in reinforcement learning (RL) improve sample efficiency but incur high computational costs, limiting real-world scalability.<n>We propose Offline Stabilization Phases for Efficient Q-Learning (SPEQ), an RL algorithm that combines low-UTD online training with periodic offline stabilization phases.<n>During these phases, Q-functions are fine-tuned with high UTD ratios on a fixed replay buffer, reducing redundant updates on suboptimal data.
arXiv Detail & Related papers (2025-01-15T09:04:19Z) - FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space.<n>For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets.<n>For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z) - Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics [29.49913315698914]
Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems.
This study focuses on optimizing DRL-based algorithms in parallel settings.
We achieve a significant boost in parallel efficiency from around 49% to approximately 78%.
arXiv Detail & Related papers (2024-02-18T09:07:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.