Related papers: Quantum-Inspired DRL Approach with LSTM and OU Noise for Cut Order Planning Optimization

Quantum-Inspired DRL Approach with LSTM and OU Noise for Cut Order Planning Optimization

URL: http://arxiv.org/abs/2508.16611v1
Date: Wed, 13 Aug 2025 05:00:50 GMT
Title: Quantum-Inspired DRL Approach with LSTM and OU Noise for Cut Order Planning Optimization
Authors: Yulison Herry Chrisnanto, Julian Evan Chrisnanto,
Abstract summary: Cut order planning (COP) is a critical challenge in the textile industry, directly impacting fabric utilization and production costs.<n>We propose a novel Quantum-Inspired Deep Reinforcement Learning framework that integrates Long Short-Term Memory networks with Ornstein-Uhlenbeck noise.<n>A comparative analysis reveals that the proposed approach achieves fabric cost savings of up to 13% compared to conventional methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cut order planning (COP) is a critical challenge in the textile industry, directly impacting fabric utilization and production costs. Conventional methods based on static heuristics and catalog-based estimations often struggle to adapt to dynamic production environments, resulting in suboptimal solutions and increased waste. In response, we propose a novel Quantum-Inspired Deep Reinforcement Learning (QI-DRL) framework that integrates Long Short-Term Memory (LSTM) networks with Ornstein-Uhlenbeck noise. This hybrid approach is designed to explicitly address key research questions regarding the benefits of quantum-inspired probabilistic representations, the role of LSTM-based memory in capturing sequential dependencies, and the effectiveness of OU noise in facilitating smooth exploration and faster convergence. Extensive training over 1000 episodes demonstrates robust performance, with an average reward of 0.81 (-+0.03) and a steady decrease in prediction loss to 0.15 (-+0.02). A comparative analysis reveals that the proposed approach achieves fabric cost savings of up to 13% compared to conventional methods. Furthermore, statistical evaluations indicate low variability and stable convergence. Despite the fact that the simulation model makes several simplifying assumptions, these promising results underscore the potential of the scalable and adaptive framework to enhance manufacturing efficiency and pave the way for future innovations in COP optimization.

Related papers

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z)
Difficulty-Estimated Policy Optimization [38.86673795561421]
We propose Difficulty-Estimated Policy Optimization (DEPO), a novel framework designed to optimize the efficiency and robustness of reasoning alignment.<n>Our approach significantly lowers the computational barrier for training high-performance reasoning models, offering a more sustainable path for reasoning scaling.
arXiv Detail & Related papers (2026-02-06T04:12:23Z)
MARINE: Theoretical Optimization and Design for Multi-Agent Recursive IN-context Enhancement [5.852607388888843]
Large Language Model (LLM)-based agents demonstrate advanced reasoning capabilities, yet practical constraints frequently limit outputs to single responses.<n>This paper introduces MARINE, a framework that reconceptualizes test-time reasoning as iterative refinement of a persistent reference trajectory.<n>Proposed MARINE delivers higher-quality samples to alignment and optimization processes than traditional sampling-and-ranking strategies.
arXiv Detail & Related papers (2025-12-05T11:19:18Z)
Rectifying LLM Thought from Lens of Optimization [48.98086817378953]
Long chain-of-thought (CoT) prompting enables thorough exploration and deliberation.<n>Despite advances, long-CoT LLMs often exhibit suboptimal reasoning behaviors.<n>We introduce RePro, a novel approach to refine LLM reasoning during post-training.
arXiv Detail & Related papers (2025-12-01T17:41:08Z)
Thompson Sampling via Fine-Tuning of LLMs [68.1722422968855]
We propose an alternative based on Thompson sampling that eliminates the need for scalable large acquisition functions.<n>Our approach Thompson Sampling via Finening (ToSFiT) leverages the prior knowledge embedded in prompt-conditioned language models, and adapts incrementally toward the posterior.<n>Our analysis reveals the critical role of careful adaptation to the posterior probability of maximality-a principle that underpins our ToSFiT algorithm.
arXiv Detail & Related papers (2025-10-15T09:13:59Z)
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving [64.15371139980802]
Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP)<n>We show that different test-time scaling strategies for ATP models introduce significant computational overhead for inference.<n>We propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits.
arXiv Detail & Related papers (2025-09-16T03:00:13Z)
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z)
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling [20.605487145370752]
Inference-time scaling has proven effective in boosting large language model (LLM) performance through increased test-time computation.<n>Yet, its practical application is often hindered by reliance on external verifiers or a lack of optimization for realistic computational constraints.<n>We propose DynScaling, which addresses these limitations through two primary innovations: an integrated parallel-sequential sampling strategy and a bandit-based dynamic budget allocation framework.
arXiv Detail & Related papers (2025-06-19T05:40:54Z)
Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning [61.403275660120606]
Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives.<n>We propose leave-one-out PPO (LOOP), a novel RL for diffusion fine-tuning method.<n>Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between computational efficiency and performance.
arXiv Detail & Related papers (2025-03-02T13:43:53Z)
Offline Reinforcement Learning via Inverse Optimization [3.0586855806896054]
We propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces.<n>To mitigate the distribution shift commonly observed in ORL problems, we employ a robust and non-causal Model Predictive Control expert.<n>Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation.
arXiv Detail & Related papers (2025-02-27T12:11:44Z)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs)<n>RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness.<n>RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Pseudo-Bayesian Optimization [7.556071491014536]
We study an axiomatic framework that elicits the minimal requirements to guarantee black-box optimization convergence.<n>We show how using simple local regression, and a suitable "randomized prior" construction to quantify uncertainty, not only guarantees convergence but also consistently outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2023-10-15T07:55:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.