BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization
- URL: http://arxiv.org/abs/2503.07580v2
- Date: Sat, 22 Mar 2025 08:59:25 GMT
- Title: BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization
- Authors: Zijun Liao, Jinbiao Chen, Debing Wang, Zizhen Zhang, Jiahai Wang,
- Abstract summary: Preference Optimization for Combinatorial Optimization (POCO) is a training paradigm that leverages solution preferences via objective values.<n>POCO is architecture-agnostic, enabling integration with existing NCO models, and establishes preference optimization as a principled framework for optimization.
- Score: 17.694852175354555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Combinatorial Optimization (NCO) has emerged as a promising approach for NP-hard problems. However, prevailing RL-based methods suffer from low sample efficiency due to sparse rewards and underused solutions. We propose Preference Optimization for Combinatorial Optimization (POCO), a training paradigm that leverages solution preferences via objective values. It introduces: (1) an efficient preference pair construction for better explore and exploit solutions, and (2) a novel loss function that adaptively scales gradients via objective differences, removing reliance on reward models or reference policies. Experiments on Job-Shop Scheduling (JSP), Traveling Salesman (TSP), and Flexible Job-Shop Scheduling (FJSP) show POCO outperforms state-of-the-art neural methods, reducing optimality gaps impressively with efficient inference. POCO is architecture-agnostic, enabling seamless integration with existing NCO models, and establishes preference optimization as a principled framework for combinatorial optimization.
Related papers
- Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization [12.899626317088885]
This paper introduces LCC, a pioneering learning-based cooperative coevolution framework.
LCC dynamically schedules decomposition strategies during optimization processes.
It offers certain advantages over state-of-the-art baselines in terms of optimization effectiveness and resource consumption.
arXiv Detail & Related papers (2025-04-24T14:09:22Z) - Unlearning Works Better Than You Think: Local Reinforcement-Based Selection of Auxiliary Objectives [1.1743167854433303]
Local Reinforcement-Based Selection of Auxiliary Objectives (LRSAO) is a novel approach that selects auxiliary objectives using reinforcement learning (RL)
We analyze and evaluate LRSAO on the black-box complexity version of the non-monotonic Jump function.
Our approach improves over this result to achieve a complexity of $Theta(n2 / ell2 + n log(n))$ resulting in a significant improvement.
arXiv Detail & Related papers (2025-04-19T23:00:24Z) - Scalable Min-Max Optimization via Primal-Dual Exact Pareto Optimization [66.51747366239299]
We propose a smooth variant of the min-max problem based on the augmented Lagrangian.
The proposed algorithm scales better with the number of objectives than subgradient-based strategies.
arXiv Detail & Related papers (2025-03-16T11:05:51Z) - Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework.<n>In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Halfway Escape Optimization: A Quantum-Inspired Solution for General Optimization Problems [6.3816899727206895]
This paper first proposes the Halfway Escape Optimization (HEO) algorithm, a quantum-inspired metaheuristic designed to address general optimization problems.
After the introduction to the HEO mechansims, the study presents a comprehensive evaluation of HEO's performance against extensively-used optimization algorithms.
The test of HEO in Pressure Vessel Design and Tubular Column Design infers its feasibility and potential in real-time applications.
arXiv Detail & Related papers (2024-05-05T08:43:07Z) - Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization [15.842155380912002]
This work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural optimization.
In particular, we design a powerful yet lightweight instance-conditioned Routing adaptation module for the NCO model.
We develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution.
arXiv Detail & Related papers (2024-05-03T08:00:19Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization [42.92248233465095]
We propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various Deep reinforcement learning (DRL) methods.
Our method leverages high-reward samples to encourage exploration of symmetric regions without additional online interactions - free.
Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks.
arXiv Detail & Related papers (2023-06-02T05:34:01Z) - Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search.
We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition.
The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z) - Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients.
Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings.
This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z) - Multi-Fidelity Bayesian Optimization via Deep Neural Networks [19.699020509495437]
In many applications, the objective function can be evaluated at multiple fidelities to enable a trade-off between the cost and accuracy.
We propose Deep Neural Network Multi-Fidelity Bayesian Optimization (DNN-MFBO) that can flexibly capture all kinds of complicated relationships between the fidelities.
We show the advantages of our method in both synthetic benchmark datasets and real-world applications in engineering design.
arXiv Detail & Related papers (2020-07-06T23:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.