Human Strategic Steering Improves Performance of Interactive
Optimization
- URL: http://arxiv.org/abs/2005.01291v1
- Date: Mon, 4 May 2020 06:56:52 GMT
- Title: Human Strategic Steering Improves Performance of Interactive
Optimization
- Authors: Fabio Colella, Pedram Daee, Jussi Jokinen, Antti Oulasvirta, Samuel
Kaski
- Abstract summary: In recommender systems, the action is to choose what to recommend, and the optimization task is to recommend items the user prefers.
We argue that this fundamental assumption can be extensively violated by human users, who are not passive feedback sources.
We designed a function optimization task where a human and an optimization algorithm collaborate to find the maximum of a 1-dimensional function.
- Score: 33.54512897507445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central concern in an interactive intelligent system is optimization of its
actions, to be maximally helpful to its human user. In recommender systems for
instance, the action is to choose what to recommend, and the optimization task
is to recommend items the user prefers. The optimization is done based on
earlier user's feedback (e.g. "likes" and "dislikes"), and the algorithms
assume the feedback to be faithful. That is, when the user clicks "like," they
actually prefer the item. We argue that this fundamental assumption can be
extensively violated by human users, who are not passive feedback sources.
Instead, they are in control, actively steering the system towards their goal.
To verify this hypothesis, that humans steer and are able to improve
performance by steering, we designed a function optimization task where a human
and an optimization algorithm collaborate to find the maximum of a
1-dimensional function. At each iteration, the optimization algorithm queries
the user for the value of a hidden function $f$ at a point $x$, and the user,
who sees the hidden function, provides an answer about $f(x)$. Our study on 21
participants shows that users who understand how the optimization works,
strategically provide biased answers (answers not equal to $f(x)$), which
results in the algorithm finding the optimum significantly faster. Our work
highlights that next-generation intelligent systems will need user models
capable of helping users who steer systems to pursue their goals.
Related papers
- Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Prompt Optimization with Human Feedback [69.95991134172282]
We study the problem of prompt optimization with human feedback (POHF)
We introduce our algorithm named automated POHF (APOHF)
The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances.
arXiv Detail & Related papers (2024-05-27T16:49:29Z) - Cooperative Bayesian Optimization for Imperfect Agents [32.15315995944448]
Two agents choose together at which points to query the function but have only control over one variable each.
We formulate the solution as sequential decision-making, where the agent we control models the user as a computationally rational agent with prior knowledge about the function.
We show that strategic planning of the queries enables better identification of the global maximum of the function as long as the user avoids excessive exploration.
arXiv Detail & Related papers (2024-03-07T12:16:51Z) - Localized Zeroth-Order Prompt Optimization [54.964765668688806]
We propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO)
ZOPO incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization.
Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency.
arXiv Detail & Related papers (2024-03-05T14:18:15Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Optimizing Algorithms From Pairwise User Preferences [23.87058308494074]
We introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences.
We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation.
arXiv Detail & Related papers (2023-08-08T20:36:59Z) - DeepHive: A multi-agent reinforcement learning approach for automated
discovery of swarm-based optimization policies [0.0]
The state of each agent within the swarm is defined as its current position and function value within a design space.
The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies.
arXiv Detail & Related papers (2023-03-29T18:08:08Z) - Judging Adam: Studying the Performance of Optimization Methods on ML4SE
Tasks [2.8961929092154697]
We test the performance of variouss on deep learning models for source code.
We find that the choice of anahead can have a significant impact on the model quality.
We suggest that the ML4SE community should consider using RAdam instead Adam as the default for code-related deep learning tasks.
arXiv Detail & Related papers (2023-03-06T22:49:20Z) - Experience in Engineering Complex Systems: Active Preference Learning
with Multiple Outcomes and Certainty Levels [1.5257326975704795]
Black-box optimization refers to the problem whose objective function and/or constraint sets are either unknown, inaccessible, or non-existent.
The algorithm so-called Active Preference Learning has been developed to exploit this specific information.
Our approach aims to extend the algorithm in such a way that can exploit further information effectively.
arXiv Detail & Related papers (2023-02-27T15:55:37Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.