Related papers: Human Strategic Steering Improves Performance of Interactive Optimization

Human Strategic Steering Improves Performance of Interactive Optimization

URL: http://arxiv.org/abs/2005.01291v1
Date: Mon, 4 May 2020 06:56:52 GMT
Title: Human Strategic Steering Improves Performance of Interactive Optimization
Authors: Fabio Colella, Pedram Daee, Jussi Jokinen, Antti Oulasvirta, Samuel Kaski
Abstract summary: In recommender systems, the action is to choose what to recommend, and the optimization task is to recommend items the user prefers. We argue that this fundamental assumption can be extensively violated by human users, who are not passive feedback sources. We designed a function optimization task where a human and an optimization algorithm collaborate to find the maximum of a 1-dimensional function.
Score: 33.54512897507445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A central concern in an interactive intelligent system is optimization of its actions, to be maximally helpful to its human user. In recommender systems for instance, the action is to choose what to recommend, and the optimization task is to recommend items the user prefers. The optimization is done based on earlier user's feedback (e.g. "likes" and "dislikes"), and the algorithms assume the feedback to be faithful. That is, when the user clicks "like," they actually prefer the item. We argue that this fundamental assumption can be extensively violated by human users, who are not passive feedback sources. Instead, they are in control, actively steering the system towards their goal. To verify this hypothesis, that humans steer and are able to improve performance by steering, we designed a function optimization task where a human and an optimization algorithm collaborate to find the maximum of a 1-dimensional function. At each iteration, the optimization algorithm queries the user for the value of a hidden function $f$ at a point $x$, and the user, who sees the hidden function, provides an answer about $f(x)$. Our study on 21 participants shows that users who understand how the optimization works, strategically provide biased answers (answers not equal to $f(x)$), which results in the algorithm finding the optimum significantly faster. Our work highlights that next-generation intelligent systems will need user models capable of helping users who steer systems to pursue their goals.

Related papers

Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Prompt Optimization with Human Feedback [69.95991134172282]
We study the problem of prompt optimization with human feedback (POHF) We introduce our algorithm named automated POHF (APOHF) The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances.
arXiv Detail & Related papers (2024-05-27T16:49:29Z)
Cooperative Bayesian Optimization for Imperfect Agents [32.15315995944448]
Two agents choose together at which points to query the function but have only control over one variable each. We formulate the solution as sequential decision-making, where the agent we control models the user as a computationally rational agent with prior knowledge about the function. We show that strategic planning of the queries enables better identification of the global maximum of the function as long as the user avoids excessive exploration.
arXiv Detail & Related papers (2024-03-07T12:16:51Z)
Localized Zeroth-Order Prompt Optimization [54.964765668688806]
We propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO) ZOPO incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency.
arXiv Detail & Related papers (2024-03-05T14:18:15Z)
REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Optimizing Algorithms From Pairwise User Preferences [23.87058308494074]
We introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences. We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation.
arXiv Detail & Related papers (2023-08-08T20:36:59Z)
DeepHive: A multi-agent reinforcement learning approach for automated discovery of swarm-based optimization policies [0.0]
The state of each agent within the swarm is defined as its current position and function value within a design space. The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies.
arXiv Detail & Related papers (2023-03-29T18:08:08Z)
Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks [2.8961929092154697]
We test the performance of variouss on deep learning models for source code. We find that the choice of anahead can have a significant impact on the model quality. We suggest that the ML4SE community should consider using RAdam instead Adam as the default for code-related deep learning tasks.
arXiv Detail & Related papers (2023-03-06T22:49:20Z)
Experience in Engineering Complex Systems: Active Preference Learning with Multiple Outcomes and Certainty Levels [1.5257326975704795]
Black-box optimization refers to the problem whose objective function and/or constraint sets are either unknown, inaccessible, or non-existent. The algorithm so-called Active Preference Learning has been developed to exploit this specific information. Our approach aims to extend the algorithm in such a way that can exploit further information effectively.
arXiv Detail & Related papers (2023-02-27T15:55:37Z)
Reverse engineering learned optimizers reveals known and novel mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems. Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z)
Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.