Related papers: Concept and the implementation of a tool to convert industry 4.0 environments modeled as FSM to an OpenAI Gym wrapper

Related papers

ToolRL: Reward is All Tool Learning Needs [54.16305891389931]
Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. Recent advancements in reinforcement learning (RL) have demonstrated promising reasoning and generalization abilities. We present the first comprehensive study on reward design for tool selection and application tasks within the RL paradigm.
arXiv Detail & Related papers (2025-04-16T21:45:32Z)
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors [104.5401871607713]
This paper proposes Weakfor-Strong Harnessing (W4S), a novel framework that customizes smaller, cost-efficient language models to design and optimize for harnessing stronger models. W4S formulates design as a multi-turn markov decision process and introduces reinforcement learning for agentic workflow optimization. Empirical results demonstrate the superiority of W4S that our 7B meta-agent, trained with just one GPU hour, outperforms the strongest baseline by 2.9% 24.6% across eleven benchmarks.
arXiv Detail & Related papers (2025-04-07T07:27:31Z)
SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process [0.0]
We present a novel reinforcement learning (RL) environment designed to both optimize industrial sorting systems and study agent behavior in evolving spaces. In simulating material flow within a sorting process our environment follows the idea of a digital twin, with operational parameters like belt speed and occupancy level. It thus includes two variants: a basic version focusing on discrete belt speed adjustments and an advanced version introducing multiple sorting modes and enhanced material composition observations.
arXiv Detail & Related papers (2025-03-13T15:38:25Z)
Yi-Lightning Technical Report [65.64771297971843]
Yi-Lightning is our latest flagship large language model (LLM) It achieves exceptional performance, ranking 6th overall on Arena. We observe a notable disparity between traditional, static benchmark results and real-world, dynamic human preferences.
arXiv Detail & Related papers (2024-12-02T08:22:56Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques. Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z)
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling [15.673219028826173]
We introduce a semi-automated data synthesis framework designed for optimization modeling issues, named OR-Instruct. We train various open-source LLMs with a capacity of 7 billion parameters (dubbed ORLMs) The resulting model demonstrates significantly enhanced optimization modeling capabilities, achieving state-of-the-art performance across the NL4OPT, MAMO, and IndustryOR benchmarks.
arXiv Detail & Related papers (2024-05-28T01:55:35Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
An Architecture for Deploying Reinforcement Learning in Industrial Environments [3.18294468240512]
We present an OPC UA based Operational Technology (OT)-aware RL architecture. We define an OPC UA information model allowing for a generalized plug-and-play like approach for exchanging the RL agent. By means of solving a toy example, we show that this architecture can be used to determine the optimal policy.
arXiv Detail & Related papers (2023-06-02T10:22:01Z)
A Mini Review on the utilization of Reinforcement Learning with OPC UA [0.9208007322096533]
Reinforcement Learning (RL) is a powerful machine learning paradigm that has been applied in various fields such as robotics, natural language processing and game playing. The key to fully exploiting this potential is the seamless integration of RL into existing industrial systems. This work serves to bridge this gap by providing a brief technical overview of both technologies and carrying out a semi-exhaustive literature review.
arXiv Detail & Related papers (2023-05-24T13:03:48Z)
A framework for fully autonomous design of materials via multiobjective optimization and active learning: challenges and next steps [2.6047112351202784]
We present an active learning process based on multiobjective black-box optimization with continuously updated machine learning models. This workflow is built on open-source technologies for real-time data streaming and modular multiobjective optimization software development. We demonstrate a proof of concept for this workflow through the autonomous operation of a continuous-flow chemistry laboratory.
arXiv Detail & Related papers (2023-04-15T01:34:16Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using Reinforcement Learning Agents [0.0]
In this paper, we introduce a meta-learning scheme that shifts the objective of learning to solve a task into the objective of learning to learn to solve a task (or a set of tasks) Our model, named REIN-2, is a meta-learning scheme formulated within the RL framework, the goal of which is to develop a meta-RL agent that learns how to produce other RL agents. Compared to traditional state-of-the-art Deep RL algorithms, experimental results show remarkable performance of our model in popular OpenAI Gym environments.
arXiv Detail & Related papers (2021-10-11T10:13:49Z)
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics [89.81550748680245]
We introduce a new differentiable physics benchmark called PasticineLab. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark.
arXiv Detail & Related papers (2021-04-07T17:59:23Z)
RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real [74.45688231140689]
We introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. We obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning.
arXiv Detail & Related papers (2020-06-16T08:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.