Related papers: Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning

Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning

URL: http://arxiv.org/abs/2309.03581v3
Date: Thu, 11 Jan 2024 14:46:13 GMT
Title: Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning
Authors: Joseph Giovanelli, Alexander Tornede, Tanja Tornede, Marius Lindauer
Abstract summary: We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML) Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
Score: 65.51668094117802
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hyperparameter optimization (HPO) is important to leverage the full potential of machine learning (ML). In practice, users are often interested in multi-objective (MO) problems, i.e., optimizing potentially conflicting objectives, like accuracy and energy consumption. To tackle this, the vast majority of MO-ML algorithms return a Pareto front of non-dominated machine learning models to the user. Optimizing the hyperparameters of such algorithms is non-trivial as evaluating a hyperparameter configuration entails evaluating the quality of the resulting Pareto front. In literature, there are known indicators that assess the quality of a Pareto front (e.g., hypervolume, R2) by quantifying different properties (e.g., volume, proximity to a reference point). However, choosing the indicator that leads to the desired Pareto front might be a hard task for a user. In this paper, we propose a human-centered interactive HPO approach tailored towards multi-objective ML leveraging preference learning to extract desiderata from users that guide the optimization. Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator. Concretely, we leverage pairwise comparisons of distinct Pareto fronts to learn such an appropriate quality indicator. Then, we optimize the hyperparameters of the underlying MO-ML algorithm towards this learned indicator using a state-of-the-art HPO approach. In an experimental study targeting the environmental impact of ML, we demonstrate that our approach leads to substantially better Pareto fronts compared to optimizing based on a wrong indicator pre-selected by the user, and performs comparable in the case of an advanced user knowing which indicator to pick.

Related papers

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models [19.559468441956714]
Reinforcement Learning from Human Feedback has emerged as a powerful technique for aligning large language models with human preferences.<n>We frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives.<n>We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions.
arXiv Detail & Related papers (2025-07-02T17:25:26Z)
Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization [58.64721525687295]
Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs)<n>We propose a Symmetric Multimodal Preference Optimization (SymMPO) which conducts symmetric preference learning with direct preference supervision (i.e., response pairs)<n>In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs.
arXiv Detail & Related papers (2025-06-13T12:29:15Z)
Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks [81.44256822500257]
RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences.<n> RLHF exhibits insufficient compliance capabilities when confronted with complex multi-instruction tasks.<n>We propose a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.
arXiv Detail & Related papers (2025-05-19T08:33:11Z)
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives. We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z)
Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization [24.533662423325943]
We study a preference-aware MO-MAB framework in the presence of explicit user preference. This is the first theoretical study of customized MO-MAB optimization with explicit user preferences.
arXiv Detail & Related papers (2025-02-19T06:06:13Z)
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning [35.62208317531141]
We advocate and introduce the unrolling paradigm, also referred to as "learning to optimize" Our unrolling approach covers various statistical feature distributions and pre-training paradigms. We report comprehensive experiments, which cover a breadth of fine-grained downstream image classification tasks.
arXiv Detail & Related papers (2024-12-21T19:01:57Z)
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC. We increase the consistency and informativeness of the pairwise preference signals through targeted modifications. We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Hyperparameter Importance Analysis for Multi-Objective AutoML [14.336028105614824]
In this paper, we propose the first method for assessing the importance of hyperparameters in the context of multi-objective hyperparameter optimization. Specifically, we compute the a-priori scalarization of the objectives and determine the importance of the hyperparameters for different objective tradeoffs. Our findings offer valuable guidance for hyperparameter tuning in MOO tasks but also contribute to advancing the understanding of HPI in complex optimization scenarios.
arXiv Detail & Related papers (2024-05-13T11:00:25Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)
Multi-Objective Bayesian Optimization with Active Preference Learning [18.066263838953223]
We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in a multi-objective optimization (MOO) problem. To minimize the interaction cost with the decision maker (DM), we also propose an active learning strategy for the preference estimation.
arXiv Detail & Related papers (2023-11-22T15:24:36Z)
Multi-objective hyperparameter optimization with performance uncertainty [62.997667081978825]
This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of Machine Learning algorithms. We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise. Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR.
arXiv Detail & Related papers (2022-09-09T14:58:43Z)
Fair and Green Hyperparameter Optimization via Multi-objective and Multiple Information Source Bayesian Optimization [0.19116784879310028]
FanG-HPO uses subsets of the large dataset (aka information sources) to obtain cheap approximations of both accuracy and fairness. Experiments consider two benchmark (fairness) datasets and two machine learning algorithms.
arXiv Detail & Related papers (2022-05-18T10:07:21Z)
Self-Evolutionary Optimization for Pareto Front Learning [34.17125297176668]
Multi-objective optimization (MOO) approaches have been proposed for multitasking problems. Recent MOO methods approximate multiple optimal solutions (Pareto front) with a single unified model. We show that PFL can be re-formulated into another MOO problem with multiple objectives, each of which corresponds to different preference weights for the tasks.
arXiv Detail & Related papers (2021-10-07T13:38:57Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.