Gamifying optimization: a Wasserstein distance-based analysis of human
  search
        - URL: http://arxiv.org/abs/2112.06292v1
- Date: Sun, 12 Dec 2021 18:23:46 GMT
- Title: Gamifying optimization: a Wasserstein distance-based analysis of human
  search
- Authors: Antonio Candelieri, Andrea Ponti, Francesco Archetti
- Abstract summary: This paper outlines a theoretical framework to characterise humans' decision-making strategies under uncertainty.
The key element in this paper is the representation of behavioural patterns of human learners as a discrete probability distribution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   The main objective of this paper is to outline a theoretical framework to
characterise humans' decision-making strategies under uncertainty, in
particular active learning in a black-box optimization task and trading-off
between information gathering (exploration) and reward seeking (exploitation).
Humans' decisions making according to these two objectives can be modelled in
terms of Pareto rationality. If a decision set contains a Pareto efficient
strategy, a rational decision maker should always select the dominant strategy
over its dominated alternatives. A distance from the Pareto frontier determines
whether a choice is Pareto rational. To collect data about humans' strategies
we have used a gaming application that shows the game field, with previous
decisions and observations, as well as the score obtained. The key element in
this paper is the representation of behavioural patterns of human learners as a
discrete probability distribution. This maps the problem of the
characterization of humans' behaviour into a space whose elements are
probability distributions structured by a distance between histograms, namely
the Wasserstein distance (WST). The distributional analysis gives new insights
about human search strategies and their deviations from Pareto rationality.
Since the uncertainty is one of the two objectives defining the Pareto
frontier, the analysis has been performed for three different uncertainty
quantification measures to identify which better explains the Pareto compliant
behavioural patterns. Beside the analysis of individual patterns WST has also
enabled a global analysis computing the barycenters and WST k-means clustering.
A further analysis has been performed by a decision tree to relate non-Paretian
behaviour, characterized by exasperated exploitation, to the dynamics of the
evolution of the reward seeking process.
 
      
        Related papers
        - Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
 We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
 arXiv  Detail & Related papers  (2025-05-29T17:56:05Z)
- The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a   Reasoning Model will Think [81.38614558541772]
 We introduce the CoT Encyclopedia, a framework for analyzing and steering model reasoning.<n>Our method automatically extracts diverse reasoning criteria from model-generated CoTs.<n>We show that this framework produces more interpretable and comprehensive analyses than existing methods.
 arXiv  Detail & Related papers  (2025-05-15T11:31:02Z)
- Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
  Pessimism [91.52263068880484]
 We study offline Reinforcement Learning with Human Feedback (RLHF)
We aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.
RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift.
 arXiv  Detail & Related papers  (2023-05-29T01:18:39Z)
- Revealed Multi-Objective Utility Aggregation in Human Driving [15.976506570992292]
 A central design problem in game theoretic analysis is the estimation of the players' utilities.
Based on the concept of rationalisability, we develop algorithms for estimating multi-objective aggregation parameters.
We show that irrespective of the specific solution concept used for solving the games, a data-driven estimation of utility aggregation significantly improves the predictive accuracy of behaviour models.
 arXiv  Detail & Related papers  (2023-03-13T19:29:17Z)
- Ground(less) Truth: A Causal Framework for Proxy Labels in
  Human-Algorithm Decision-Making [29.071173441651734]
 We identify five sources of target variable bias that can impact the validity of proxy labels in human-AI decision-making tasks.
We develop a causal framework to disentangle the relationship between each bias.
We conclude by discussing opportunities to better address target variable bias in future research.
 arXiv  Detail & Related papers  (2023-02-13T16:29:11Z)
- Reinforcement Learning with a Terminator [80.34572413850186]
 We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
 arXiv  Detail & Related papers  (2022-05-30T18:40:28Z)
- From Cognitive to Computational Modeling: Text-based Risky
  Decision-Making Guided by Fuzzy Trace Theory [5.154015755506085]
 Fuzzy trace theory (FTT) is a powerful paradigm that explains human decision-making by incorporating gists.
We propose a computational framework which combines the effects of the underlying semantics and sentiments on text-based decision-making.
In particular, we introduce Category-2- to learn categorical gists and categorical sentiments, and demonstrate how our computational model can be optimised to predict risky decision-making in groups and individuals.
 arXiv  Detail & Related papers  (2022-05-15T02:25:28Z)
- Learning MDPs from Features: Predict-Then-Optimize for Sequential
  Decision Problems by Reinforcement Learning [52.74071439183113]
 We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
 arXiv  Detail & Related papers  (2021-06-06T23:53:31Z)
- Uncertainty quantification and exploration-exploitation trade-off in
  humans [0.0]
 The main objective of this paper is to outline a theoretical framework to analyse how humans' decision-making strategies under uncertainty manage the trade-off between information gathering (exploration) and reward seeking (exploitation)
 arXiv  Detail & Related papers  (2021-02-05T16:03:04Z)
- Identification of Unexpected Decisions in Partially Observable
  Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
 We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
 arXiv  Detail & Related papers  (2020-12-23T15:09:28Z)
- Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
 We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
 arXiv  Detail & Related papers  (2020-05-31T05:52:05Z)
- Invariant Rationalization [84.1861516092232]
 A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale.
We introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments.
We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments.
 arXiv  Detail & Related papers  (2020-03-22T00:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.