Related papers: Evolutionary Preference Sampling for Pareto Set Learning

Evolutionary Preference Sampling for Pareto Set Learning

URL: http://arxiv.org/abs/2404.08414v1
Date: Fri, 12 Apr 2024 11:58:13 GMT
Title: Evolutionary Preference Sampling for Pareto Set Learning
Authors: Rongguang Ye, Longcan Chen, Jinyuan Zhang, Hisao Ishibuchi,
Abstract summary: We consider preference sampling as an evolutionary process to generate preference vectors for neural network training. Our proposed method has a faster convergence speed than baseline algorithms on 7 testing problems.
Score: 7.306693705576791
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Pareto Set Learning (PSL) has been proposed for learning the entire Pareto set using a neural network. PSL employs preference vectors to scalarize multiple objectives, facilitating the learning of mappings from preference vectors to specific Pareto optimal solutions. Previous PSL methods have shown their effectiveness in solving artificial multi-objective optimization problems (MOPs) with uniform preference vector sampling. The quality of the learned Pareto set is influenced by the sampling strategy of the preference vector, and the sampling of the preference vector needs to be decided based on the Pareto front shape. However, a fixed preference sampling strategy cannot simultaneously adapt the Pareto front of multiple MOPs. To address this limitation, this paper proposes an Evolutionary Preference Sampling (EPS) strategy to efficiently sample preference vectors. Inspired by evolutionary algorithms, we consider preference sampling as an evolutionary process to generate preference vectors for neural network training. We integrate the EPS strategy into five advanced PSL methods. Extensive experiments demonstrate that our proposed method has a faster convergence speed than baseline algorithms on 7 testing problems. Our implementation is available at https://github.com/rG223/EPS.

Related papers

Adaptive Sample Scheduling for Direct Preference Optimization [37.75208455935495]
We introduce a novel problem: Sample Scheduling for DPO.<n>It aims to dynamically and adaptively schedule training samples based on the model's evolving states.<n>We propose SamS, an efficient and effective algorithm that adaptively selects samples in each training batch.
arXiv Detail & Related papers (2025-06-08T10:26:09Z)
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives. We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z)
Vector Optimization with Gaussian Process Bandits [7.049738935364297]
Learning problems in which multiple objectives must be considered simultaneously often arise in various fields, including engineering, drug design, and environmental management. Traditional methods for dealing with multiple black-box objective functions have limitations in incorporating objective preferences and exploring the solution space accordingly. We propose Vector Optimization with Gaussian Process (VOGP), a probably approximately correct adaptive elimination algorithm that performs black-box vector optimization using Gaussian process bandits.
arXiv Detail & Related papers (2024-12-03T14:47:46Z)
Preference Optimization with Multi-Sample Comparisons [53.02717574375549]
We introduce a novel approach that extends post-training to include multi-sample comparisons. These approaches fail to capture critical characteristics such as generative diversity and bias. We demonstrate that multi-sample comparison is more effective in optimizing collective characteristics than single-sample comparison.
arXiv Detail & Related papers (2024-10-16T00:59:19Z)
Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing [13.775902519100075]
Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist sampling. Deep learning-based reconstruction has been a promising alternative to optimization-based reconstruction.
arXiv Detail & Related papers (2024-09-18T06:51:29Z)
Geometric-Averaged Preference Optimization for Soft Preference Labels [78.2746007085333]
Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic. In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.
arXiv Detail & Related papers (2024-09-10T17:54:28Z)
Preference-Optimized Pareto Set Learning for Blackbox Optimization [1.9628841617148691]
No single solution exists that can optimize all the objectives simultaneously. In a typical MOO problem, the goal is to find a set of optimum solutions (Pareto set) that trades off the preferences among objectives. Our formulation leads to a bilevel optimization problem that can be solved by e.g. differentiable cross-entropy methods.
arXiv Detail & Related papers (2024-08-19T13:23:07Z)
Pareto Front Shape-Agnostic Pareto Set Learning in Multi-Objective Optimization [6.810571151954673]
Existing methods rely on the mapping of preference vectors in the objective space to optimal solutions in the decision space. Our proposed method can handle any shape of the Pareto front and learn the Pareto set without requiring prior knowledge.
arXiv Detail & Related papers (2024-08-11T14:09:40Z)
Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning [14.260168974085376]
This paper investigates multi-objective reinforcement learning (MORL) It focuses on learning optimal policies in the presence of multiple reward functions. Despite MORL's success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms.
arXiv Detail & Related papers (2024-07-24T17:58:49Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Data-Driven Preference Sampling for Pareto Front Learning [10.70174844791007]
We propose a data-driven preference vector sampling framework for Pareto front learning. We use the posterior information of the objective functions to adjust the parameters of the sampling distribution flexibly. We design the distribution of the preference vector as a mixture of Dirichlet distributions to improve the performance of the model.
arXiv Detail & Related papers (2024-04-12T11:06:22Z)
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization [105.3612692153615]
We propose a new axis based on eliciting preferences jointly over instruction-response pairs. Joint preferences over instruction and response pairs can significantly enhance the alignment of large language models.
arXiv Detail & Related papers (2024-03-31T02:05:40Z)
Thompson sampling for improved exploration in GFlowNets [75.89693358516944]
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
arXiv Detail & Related papers (2023-06-30T14:19:44Z)
Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction. Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.