Related papers: BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization

Related papers

Optimizers Qualitatively Alter Solutions And We Should Leverage This [62.662640460717476]
Deep Neural Networks (DNNs) can not guarantee convergence to a unique global minimum of the loss when using only local information, such as SGD.<n>We argue that the community should aim at understanding the biases of already existing methods, as well as aim to build new DNNs with the explicit intent of inducing certain properties of the solution.
arXiv Detail & Related papers (2025-07-16T13:33:31Z)
Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation [10.153136816705542]
POCCO is a novel plug-and-play framework that enables adaptive selection of model structures for subproblems.<n>We propose a preference-driven optimization algorithm that learns pairwise preferences between winning and losing solutions.
arXiv Detail & Related papers (2025-06-10T15:25:06Z)
Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization [12.899626317088885]
This paper introduces LCC, a pioneering learning-based cooperative coevolution framework. LCC dynamically schedules decomposition strategies during optimization processes. It offers certain advantages over state-of-the-art baselines in terms of optimization effectiveness and resource consumption.
arXiv Detail & Related papers (2025-04-24T14:09:22Z)
Unlearning Works Better Than You Think: Local Reinforcement-Based Selection of Auxiliary Objectives [1.1743167854433303]
Local Reinforcement-Based Selection of Auxiliary Objectives (LRSAO) is a novel approach that selects auxiliary objectives using reinforcement learning (RL) We analyze and evaluate LRSAO on the black-box complexity version of the non-monotonic Jump function. Our approach improves over this result to achieve a complexity of $Theta(n2 / ell2 + n log(n))$ resulting in a significant improvement.
arXiv Detail & Related papers (2025-04-19T23:00:24Z)
Scalable Min-Max Optimization via Primal-Dual Exact Pareto Optimization [66.51747366239299]
We propose a smooth variant of the min-max problem based on the augmented Lagrangian. The proposed algorithm scales better with the number of objectives than subgradient-based strategies.
arXiv Detail & Related papers (2025-03-16T11:05:51Z)
HoP: Homeomorphic Polar Learning for Hard Constrained Optimization [3.8166443770130822]
Constrained optimization demands highly efficient synthetic training approaches.<n>As a data-driven learning method, L2O leverages neural networks efficiently produce approximate solutions.<n>HoP achieves solutions closer to the optimum than existing L2O methods.<n>In all cases, HoP achieves solutions closer to the optimum than existing L2O methods.
arXiv Detail & Related papers (2025-02-01T03:59:15Z)
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework.<n>In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Leader Reward for POMO-Based Neural Combinatorial Optimization [8.301694061287565]
We propose Leader Reward to enhance the model's ability to generate optimal solutions. We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model.
arXiv Detail & Related papers (2024-05-22T19:27:03Z)
Halfway Escape Optimization: A Quantum-Inspired Solution for General Optimization Problems [6.3816899727206895]
This paper first proposes the Halfway Escape Optimization (HEO) algorithm, a quantum-inspired metaheuristic designed to address general optimization problems. After the introduction to the HEO mechansims, the study presents a comprehensive evaluation of HEO's performance against extensively-used optimization algorithms. The test of HEO in Pressure Vessel Design and Tubular Column Design infers its feasibility and potential in real-time applications.
arXiv Detail & Related papers (2024-05-05T08:43:07Z)
Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization [15.842155380912002]
This work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural optimization. In particular, we design a powerful yet lightweight instance-conditioned Routing adaptation module for the NCO model. We develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution.
arXiv Detail & Related papers (2024-05-03T08:00:19Z)
Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice. We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z)
Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization [42.92248233465095]
We propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various Deep reinforcement learning (DRL) methods. Our method leverages high-reward samples to encourage exploration of symmetric regions without additional online interactions - free. Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks.
arXiv Detail & Related papers (2023-06-02T05:34:01Z)
Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z)
Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients. Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z)
Multi-Fidelity Bayesian Optimization via Deep Neural Networks [19.699020509495437]
In many applications, the objective function can be evaluated at multiple fidelities to enable a trade-off between the cost and accuracy. We propose Deep Neural Network Multi-Fidelity Bayesian Optimization (DNN-MFBO) that can flexibly capture all kinds of complicated relationships between the fidelities. We show the advantages of our method in both synthetic benchmark datasets and real-world applications in engineering design.
arXiv Detail & Related papers (2020-07-06T23:28:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.