Regularized GLISp for sensor-guided human-in-the-loop optimization
- URL: http://arxiv.org/abs/2511.04751v1
- Date: Thu, 06 Nov 2025 19:00:56 GMT
- Title: Regularized GLISp for sensor-guided human-in-the-loop optimization
- Authors: Matteo Cercola, Michele Lomuscio, Dario Piga, Simone Formentin,
- Abstract summary: We introduce a sensor-guided regularized extension of GLISp that integrates measurable descriptors into the preference-learning loop.<n>This injects grey-box structure, combining subjective feedback with quantitative sensor information.<n> Numerical evaluations on an analytical benchmark and on a human-in-the-loop vehicle suspension tuning task show faster convergence and superior final solutions.
- Score: 0.769971486557519
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human-in-the-loop calibration is often addressed via preference-based optimization, where algorithms learn from pairwise comparisons rather than explicit cost evaluations. While effective, methods such as Preferential Bayesian Optimization or Global optimization based on active preference learning with radial basis functions (GLISp) treat the system as a black box and ignore informative sensor measurements. In this work, we introduce a sensor-guided regularized extension of GLISp that integrates measurable descriptors into the preference-learning loop through a physics-informed hypothesis function and a least-squares regularization term. This injects grey-box structure, combining subjective feedback with quantitative sensor information while preserving the flexibility of preference-based search. Numerical evaluations on an analytical benchmark and on a human-in-the-loop vehicle suspension tuning task show faster convergence and superior final solutions compared to baseline GLISp.
Related papers
- POP: Prior-fitted Optimizer Policies [20.784587787548436]
We introduce POP (Prior Policies Policies), a meta-learned model that predicts coordinate step-wise on contextual information.<n>Our model is learned on millions of synthetic optimization problems sampled from both nonfitted objectives.
arXiv Detail & Related papers (2026-02-17T10:27:07Z) - How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics [65.67654005892469]
We show that proper instance-dependent sampling can yield stronger ranking guarantees, while skewed on-policy sampling can induce excessive concentration under structured preferences.<n>We then analyze iterative alignment dynamics in which the learned policy feeds back into future sampling and reference policies.<n>Our theoretical insights extend to Direct Preference Optimization, indicating the phenomena we captured are common to a broader class of preference-alignment methods.
arXiv Detail & Related papers (2026-02-12T17:11:08Z) - Understanding Optimization in Deep Learning with Central Flows [95.5647720254338]
We develop theory that can describe the dynamics of optimization in a complex regime.<n>Our results suggest that central flows can be a valuable theoretical tool for reasoning about optimization in deep learning.
arXiv Detail & Related papers (2024-10-31T17:58:13Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Optimizer's Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization [16.57676001669012]
In data-driven optimization, the sample performance of the obtained decision typically incurs an optimistic bias against the true performance.<n>Common techniques to correct this bias, such as cross-validation, require repeatedly solving additional optimization problems and are therefore expensive.<n>We develop a general bias correction approach that directly approximates the first-order bias and does not require solving any additional optimization problems.
arXiv Detail & Related papers (2023-06-16T07:07:58Z) - A unified surrogate-based scheme for black-box and preference-based
optimization [2.561649173827544]
We show that black-box and preference-based optimization problems are closely related and can be solved using the same family of approaches.
We propose the generalized Metric Response Surface (gMRS) algorithm, an optimization scheme that is a generalization of the popular MSRS framework.
arXiv Detail & Related papers (2022-02-03T08:47:54Z) - GLISp-r: A preference-based optimization algorithm with convergence
guarantees [2.517173388598129]
We propose an extension of a preference-based optimization procedure called GLISp-r.
In GLISp-r, we propose a different criterion to use when looking for new candidate samples that is inspired by MSRS.
Compared to GLISp, GLISp-r is less likely to get stuck on local optima of the preference-based optimization problem.
arXiv Detail & Related papers (2022-02-02T16:34:15Z) - An automatic differentiation system for the age of differential privacy [65.35244647521989]
Tritium is an automatic differentiation-based sensitivity analysis framework for differentially private (DP) machine learning (ML)
We introduce Tritium, an automatic differentiation-based sensitivity analysis framework for differentially private (DP) machine learning (ML)
arXiv Detail & Related papers (2021-09-22T08:07:42Z) - Directed particle swarm optimization with Gaussian-process-based
function forecasting [15.733136147164032]
Particle swarm optimization (PSO) is an iterative search method that moves a set of candidate solution around a search-space towards the best known global and local solutions with randomized step lengths.
We show that our algorithm attains desirable properties for exploratory and exploitative behavior.
arXiv Detail & Related papers (2021-02-08T13:02:57Z) - Human Preference-Based Learning for High-dimensional Optimization of
Exoskeleton Walking Gaits [55.59198568303196]
This work presents LineCoSpar, a human-in-the-loop preference-based framework to learn user preferences in high dimensions.
In simulations and human trials, we empirically verify that LineCoSpar is a sample-efficient approach for high-dimensional preference optimization.
This result has implications for exoskeleton gait synthesis, an active field with applications to clinical use and patient rehabilitation.
arXiv Detail & Related papers (2020-03-13T22:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.