Related papers: Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

URL: http://arxiv.org/abs/2501.01367v1
Date: Thu, 02 Jan 2025 17:26:01 GMT
Title: Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation
Authors: Nathaniel Dennler, Stefanos Nikolaidis, Maja Matarić,
Abstract summary: We propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about.<n>CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.
Score: 6.033491390990401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: People have a variety of preferences for how robots behave. To understand and reason about these preferences, robots aim to learn a reward function that describes how aligned robot behaviors are with a user's preferences. Good representations of a robot's behavior can significantly reduce the time and effort required for a user to teach the robot their preferences. Specifying these representations -- what "features" of the robot's behavior matter to users -- remains a difficult problem; Features learned from raw data lack semantic meaning and features learned from user data require users to engage in tedious labeling processes. Our key insight is that users tasked with customizing a robot are intrinsically motivated to produce labels through exploratory search; they explore behaviors that they find interesting and ignore behaviors that are irrelevant. To harness this novel data source of exploratory actions, we propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about. We learned CLEA features from exploratory actions users performed in an open-ended signal design activity (N=25) with a Kuri robot, and evaluated CLEA features through a second user study with a different set of users (N=42). CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.

Related papers

Focusing Robot Open-Ended Reinforcement Learning Through Users' Purposes [1.0013553984400492]
Open-Ended Learning (OEL) autonomous robots can acquire new skills and knowledge through direct interaction with their environment. We propose a solution called Purpose-Directed Open-Ended Learning' (POEL)
arXiv Detail & Related papers (2025-03-16T17:22:11Z)
Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots [5.523009758632668]
We show that CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive than previous approaches across both physical and social robot tasks.
arXiv Detail & Related papers (2024-11-17T21:52:58Z)
Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction [52.12746368727368]
Differentiable simulation has become a powerful tool for system identification. Our approach calibrates object properties by using information from the robot, without relying on data from the object itself. We demonstrate the effectiveness of our method on a low-cost robotic platform.
arXiv Detail & Related papers (2024-10-04T20:48:38Z)
Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features. This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z)
Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods [26.55942230051388]
We evaluate interactive segmentation models through either real user studies or simulated annotators. Real user studies are expensive and often limited in scale, while simulated annotators, also known as robot users, tend to overestimate model performance. We propose a more realistic robot user that reduces the user shift by incorporating human factors such as click variation and inter-annotator disagreement.
arXiv Detail & Related papers (2024-04-02T10:19:17Z)
What Matters to You? Towards Visual Representation Alignment for Robot Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences. We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization [112.40598205054994]
We formalize this idea as a completely unsupervised objective for optimizing interfaces. We conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games. The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains.
arXiv Detail & Related papers (2022-05-24T21:57:18Z)
Learning Reward Functions from Scale Feedback [11.941038991430837]
A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. We propose scale feedback, where the user utilizes a slider to give more nuanced information. We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies.
arXiv Detail & Related papers (2021-10-01T09:45:18Z)
A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation [52.743311026230714]
Persona Exploration and Exploitation (PEE) is able to extend the predefined user persona description with semantically correlated content. PEE consists of two main modules: persona exploration and persona exploitation. Our approach outperforms state-of-the-art baselines in terms of both automatic and human evaluations.
arXiv Detail & Related papers (2020-02-06T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.