Improving Confidence in the Estimation of Values and Norms
- URL: http://arxiv.org/abs/2004.01056v1
- Date: Thu, 2 Apr 2020 15:03:03 GMT
- Title: Improving Confidence in the Estimation of Values and Norms
- Authors: Luciano Cavalcante Siebert, Rijk Mercuur, Virginia Dignum, Jeroen van
den Hoven, Catholijn Jonker
- Abstract summary: This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent based on its actions in the ultimatum game.
We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis.
- Score: 3.8323580808203785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous agents (AA) will increasingly be interacting with us in our daily
lives. While we want the benefits attached to AAs, it is essential that their
behavior is aligned with our values and norms. Hence, an AA will need to
estimate the values and norms of the humans it interacts with, which is not a
straightforward task when solely observing an agent's behavior. This paper
analyses to what extent an AA is able to estimate the values and norms of a
simulated human agent (SHA) based on its actions in the ultimatum game. We
present two methods to reduce ambiguity in profiling the SHAs: one based on
search space exploration and another based on counterfactual analysis. We found
that both methods are able to increase the confidence in estimating human
values and norms, but differ in their applicability, the latter being more
efficient when the number of interactions with the agent is to be minimized.
These insights are useful to improve the alignment of AAs with human values and
norms.
Related papers
- Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
reliance is influenced by numerous factors within the interactional context of a generation.
We introduce Rel-A.I., an in situ, system-level evaluation approach to measure reliance.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - Why not both? Complementing explanations with uncertainty, and the role
of self-confidence in Human-AI collaboration [12.47276164048813]
We conduct an empirical study to identify how uncertainty estimates and model explanations affect users' reliance, understanding, and trust towards a model.
We also discuss how the latter may distort the outcome of an analysis based on agreement and switching percentages.
arXiv Detail & Related papers (2023-04-27T12:24:33Z) - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios.
We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations.
Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z) - Value Engineering for Autonomous Agents [3.6130723421895947]
Previous approaches have treated values as labels associated with some actions or states of the world, rather than as integral components of agent reasoning.
We propose a new AMA paradigm grounded in moral and social psychology, where values are instilled into agents as context-dependent goals.
We argue that this type of normative reasoning, where agents are endowed with an understanding of norms' moral implications, leads to value-awareness in autonomous agents.
arXiv Detail & Related papers (2023-02-17T08:52:15Z) - Comparing Psychometric and Behavioral Predictors of Compliance During
Human-AI Interactions [5.893351309010412]
A common hypothesis in adaptive AI research is that minor differences in people's predisposition to trust can significantly impact their likelihood of complying with recommendations from the AI.
We benchmark a popular measure of this kind against behavioral predictors of compliance.
This suggests a general property that individual differences in initial behavior are more predictive than differences in self-reported trust attitudes.
arXiv Detail & Related papers (2023-02-03T16:56:25Z) - Aligning to Social Norms and Values in Interactive Narratives [89.82264844526333]
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games.
We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
arXiv Detail & Related papers (2022-05-04T09:54:33Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - ACP++: Action Co-occurrence Priors for Human-Object Interaction
Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z) - Training Value-Aligned Reinforcement Learning Agents Using a Normative
Prior [10.421378728492437]
It is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm.
We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward.
We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative.
arXiv Detail & Related papers (2021-04-19T17:33:07Z) - Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.