Improving Confidence in the Estimation of Values and Norms
- URL: http://arxiv.org/abs/2004.01056v1
- Date: Thu, 2 Apr 2020 15:03:03 GMT
- Title: Improving Confidence in the Estimation of Values and Norms
- Authors: Luciano Cavalcante Siebert, Rijk Mercuur, Virginia Dignum, Jeroen van
den Hoven, Catholijn Jonker
- Abstract summary: This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent based on its actions in the ultimatum game.
We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis.
- Score: 3.8323580808203785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous agents (AA) will increasingly be interacting with us in our daily
lives. While we want the benefits attached to AAs, it is essential that their
behavior is aligned with our values and norms. Hence, an AA will need to
estimate the values and norms of the humans it interacts with, which is not a
straightforward task when solely observing an agent's behavior. This paper
analyses to what extent an AA is able to estimate the values and norms of a
simulated human agent (SHA) based on its actions in the ultimatum game. We
present two methods to reduce ambiguity in profiling the SHAs: one based on
search space exploration and another based on counterfactual analysis. We found
that both methods are able to increase the confidence in estimating human
values and norms, but differ in their applicability, the latter being more
efficient when the number of interactions with the agent is to be minimized.
These insights are useful to improve the alignment of AAs with human values and
norms.
Related papers
- Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge [51.93909886542317]
We show how a single aggregate correlation score can obscure differences between human behavior and automatic evaluation methods.
We propose stratifying results by human label uncertainty to provide a more robust analysis of automatic evaluation performance.
arXiv Detail & Related papers (2024-10-03T03:08:29Z) - IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering [10.338962367542331]
We introduce an automatic evaluation framework IQA-EVAL to achieve Interactive Question Answering Evaluations.
We also introduce a LLM-based Evaluation Agent (LEA) that can simulate human behaviors to generate interactions with IQA models.
We show that our evaluation framework with GPT-4 as the backbone model achieves a high correlation with human evaluations on the IQA task.
arXiv Detail & Related papers (2024-08-24T10:34:20Z) - Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction.
We find that contextual characteristics significantly affect human reliance behavior.
Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios.
We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations.
Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z) - Value Engineering for Autonomous Agents [3.6130723421895947]
Previous approaches have treated values as labels associated with some actions or states of the world, rather than as integral components of agent reasoning.
We propose a new AMA paradigm grounded in moral and social psychology, where values are instilled into agents as context-dependent goals.
We argue that this type of normative reasoning, where agents are endowed with an understanding of norms' moral implications, leads to value-awareness in autonomous agents.
arXiv Detail & Related papers (2023-02-17T08:52:15Z) - Comparing Psychometric and Behavioral Predictors of Compliance During
Human-AI Interactions [5.893351309010412]
A common hypothesis in adaptive AI research is that minor differences in people's predisposition to trust can significantly impact their likelihood of complying with recommendations from the AI.
We benchmark a popular measure of this kind against behavioral predictors of compliance.
This suggests a general property that individual differences in initial behavior are more predictive than differences in self-reported trust attitudes.
arXiv Detail & Related papers (2023-02-03T16:56:25Z) - Aligning to Social Norms and Values in Interactive Narratives [89.82264844526333]
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games.
We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
arXiv Detail & Related papers (2022-05-04T09:54:33Z) - ACP++: Action Co-occurrence Priors for Human-Object Interaction
Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z) - Training Value-Aligned Reinforcement Learning Agents Using a Normative
Prior [10.421378728492437]
It is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm.
We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward.
We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative.
arXiv Detail & Related papers (2021-04-19T17:33:07Z) - Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.