Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control
- URL: http://arxiv.org/abs/2510.20408v1
- Date: Thu, 23 Oct 2025 10:21:54 GMT
- Title: Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control
- Authors: Tom Maus, Asma Atamna, Tobias Glasmachers,
- Abstract summary: This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym.<n>We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking.
- Score: 0.2676349883103403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous control of multi-stage industrial processes requires both local specialization and global coordination. Reinforcement learning (RL) offers a promising approach, but its industrial adoption remains limited due to challenges such as reward design, modularity, and action space management. Many academic benchmarks differ markedly from industrial control problems, limiting their transferability to real-world applications. This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym, into a sequential recycling scenario with sorting and pressing operations. We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking. Our experiments show that without action masking, agents struggle to learn effective policies, with the modular architecture performing better. When action masking is applied, both architectures improve substantially, and the performance gap narrows considerably. These results highlight the decisive role of action space constraints and suggest that the advantages of specialization diminish as action complexity is reduced. The proposed benchmark thus provides a valuable testbed for exploring practical and robust multi-agent RL solutions in industrial automation, while contributing to the ongoing debate on centralization versus specialization.
Related papers
- Integrating Diverse Assignment Strategies into DETRs [61.61489761918158]
Label assignment is a critical component in object detectors, particularly within DETR-style frameworks.<n>We propose LoRA-DETR, a flexible and lightweight framework that seamlessly integrates diverse assignment strategies into any DETR-style detector.
arXiv Detail & Related papers (2026-01-14T07:28:54Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs [26.165537937650413]
Training agents to operate under strict constraints during deployment presents significant challenges.<n>We propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements.
arXiv Detail & Related papers (2025-11-04T16:14:56Z) - MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning [82.14973479594367]
Large Language Models (LLMs) for complex reasoning tasks require innovative approaches that bridge intuitive and deliberate cognitive processes.<n>This paper introduces a Multi-Agent System for Deep ReSearch (MARS) enabling seamless integration of System 1's fast, intuitive thinking with System 2's deliberate reasoning.
arXiv Detail & Related papers (2025-10-06T15:42:55Z) - OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System [61.12400636463362]
OnePiece is a unified framework that seamlessly integrates LLM-style context engineering and reasoning into both retrieval and ranking models.<n>OnePiece has been deployed in the main personalized search scenario of Shopee and achieves consistent online gains across different key business metrics.
arXiv Detail & Related papers (2025-09-22T17:59:07Z) - InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling [71.37579508777843]
Large language models (LLMs) have revolutionized artificial intelligence by enabling complex reasoning capabilities.<n>To address this gap, we present InternBootcamp, an open-source framework comprising 1000+ domain-diverse task environments.
arXiv Detail & Related papers (2025-08-12T05:00:00Z) - Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems [1.8434042562191815]
This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process (MDP)<n>The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning (DRL) agents to optimize task and resource scheduling.
arXiv Detail & Related papers (2025-07-22T14:34:36Z) - Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance [1.1718316049475228]
Multi-Agent Systems (MAS) excel at accomplishing complex objectives through the collaborative efforts of individual agents.<n>In this paper, we introduce a novel framework that aims to overcome the challenge of designing an effective reward function.<n>By giving large language models (LLMs) on the prioritization of tasks, our framework generates reward functions that can be dynamically adjusted online.
arXiv Detail & Related papers (2025-07-22T09:26:00Z) - Control-Optimized Deep Reinforcement Learning for Artificially Intelligent Autonomous Systems [8.766411351797885]
Deep reinforcement learning (DRL) has become a powerful tool for complex decision-making in machine learning and AI.<n>Traditional methods often assume perfect action execution, overlooking the uncertainties and deviations between an agent's selected actions and the actual system response.<n>This work advances AI by developing a novel control-optimized DRL framework that explicitly models and compensates for action execution mismatches.
arXiv Detail & Related papers (2025-06-30T21:25:52Z) - Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making [9.34311343273189]
Agents Co-Evolution (ACE) is a synergistic framework between Large Language Models (LLMs) and Reinforcement Learning (RL)<n>ACE introduces a dual-role trajectory refinement mechanism where LLMs act as both Policy Actor and Value Critic during RL's training.<n>Through experiments across multiple power grid operation challenges with action spaces exceeding 60K discrete actions, ACE demonstrates superior performance over existing RL methods.
arXiv Detail & Related papers (2025-06-03T06:52:37Z) - ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation [96.86211867758652]
Low-Rank Adaptation (LoRA) is widely adopted for downstream fine-tuning of foundation models.<n>We propose ThanoRA, a Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation framework.
arXiv Detail & Related papers (2025-05-24T11:01:45Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic
through Gaussian Processes and Control Barrier Functions [3.5897534810405403]
Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications.
In this paper, we propose a learning-based control framework consisting of several aspects.
We show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability.
arXiv Detail & Related papers (2021-09-07T00:51:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.