UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization
- URL: http://arxiv.org/abs/2511.10148v1
- Date: Fri, 14 Nov 2025 01:35:14 GMT
- Title: UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization
- Authors: Zhanhong Fang, Debing Wang, Jinbiao Chen, Jiahai Wang, Zizhen Zhang,
- Abstract summary: Universal Constrained Preference Optimization (UCPO) is a novel plug-and-play framework that seamlessly integrates preference learning into existing neural solvers.<n>UCPO embeds constraint satisfaction directly into a preference-based objective.<n>It achieves exceptional performance with as little as 1% of the original training budget.
- Score: 20.001544937354577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural solvers have demonstrated remarkable success in combinatorial optimization, often surpassing traditional heuristics in speed, solution quality, and generalization. However, their efficacy deteriorates significantly when confronted with complex constraints that cannot be effectively managed through simple masking mechanisms. To address this limitation, we introduce Universal Constrained Preference Optimization (UCPO), a novel plug-and-play framework that seamlessly integrates preference learning into existing neural solvers via a specially designed loss function, without requiring architectural modifications. UCPO embeds constraint satisfaction directly into a preference-based objective, eliminating the need for meticulous hyperparameter tuning. Leveraging a lightweight warm-start fine-tuning protocol, UCPO enables pre-trained models to consistently produce near-optimal, feasible solutions on challenging constraint-laden tasks, achieving exceptional performance with as little as 1\% of the original training budget.
Related papers
- Towards Efficient Constraint Handling in Neural Solvers for Routing Problems [53.35866378109893]
We present Construct-and-Refine, the first general and efficient constraint-handling framework for neural routing solvers.<n>CaR achieves superior feasibility, solution quality, and efficiency compared to both classical and neural state-of-the-art solvers.
arXiv Detail & Related papers (2026-02-17T21:06:23Z) - Optimizing Optimizers for Fast Gradient-Based Learning [53.81268610971847]
We lay the theoretical foundation for automating design in gradient-based learning.<n>By treating gradient loss signals as a function that translates to parameter motions, the problem reduces to a family of convex optimization problems.
arXiv Detail & Related papers (2025-12-06T09:50:41Z) - A Markovian Framing of WaveFunctionCollapse for Procedurally Generating Aesthetically Complex Environments [5.114029940159893]
Procedural content generation often requires satisfying both designer-specified objectives and adjacency constraints implicitly imposed by the underlying tile set.<n>We reformulate WaveFunctionCol (WFC) as a Markov Decision Process (MDP)<n>We find that joint optimization not only struggles as task complexity increases, but consistently underperforms relative to optimization over the WFC-MDP.
arXiv Detail & Related papers (2025-09-12T01:51:01Z) - TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z) - Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization [17.694852175354555]
We propose Best-anchored and Objective-guided Preference Optimization (BOPO), a training paradigm that leverages solution preferences via objective values.<n>Experiments on Job-shop Problem (JSP), Traveling Salesman Problem (TSP), and Flexible Job-shop Scheduling Problem (FJSP) show BOPO outperforms state-of-the-art neural methods.
arXiv Detail & Related papers (2025-03-10T17:45:30Z) - SOMTP: Self-Supervised Learning-Based Optimizer for MPC-Based Safe Trajectory Planning Problems in Robotics [13.129654942805846]
Model Predictive Control (MP)-based trajectory planning has been widely used in, and Control Barrier (CBF) can improve its constraints.
In this paper, we propose a self-supervised learning algorithm for CBF-MPC trajectory planning.
arXiv Detail & Related papers (2024-05-15T09:38:52Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - Diffusing the Optimal Topology: A Generative Optimization Approach [6.375982344506753]
Topology optimization seeks to find the best design that satisfies a set of constraints while maximizing system performance.
Traditional iterative optimization methods like SIMP can be computationally expensive and get stuck in local minima.
We propose a Generative Optimization method that integrates classic optimization like SIMP as a refining mechanism for the topology generated by a deep generative model.
arXiv Detail & Related papers (2023-03-17T03:47:10Z) - Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients.
Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings.
This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z) - A Globally Convergent Evolutionary Strategy for Stochastic Constrained
Optimization with Applications to Reinforcement Learning [0.6445605125467573]
Evolutionary strategies have been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning.
Convergence guarantees for evolutionary strategies to optimize constrained problems are however lacking in the literature.
arXiv Detail & Related papers (2022-02-21T17:04:51Z) - On Constraints in First-Order Optimization: A View from Non-Smooth
Dynamical Systems [99.59934203759754]
We introduce a class of first-order methods for smooth constrained optimization.
Two distinctive features of our approach are that projections or optimizations over the entire feasible set are avoided.
The resulting algorithmic procedure is simple to implement even when constraints are nonlinear.
arXiv Detail & Related papers (2021-07-17T11:45:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.