Related papers: DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement

DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement

URL: http://arxiv.org/abs/2407.08876v1
Date: Thu, 11 Jul 2024 21:28:02 GMT
Title: DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement
Authors: Benjamin A. Newman, Pranay Gupta, Kris Kitani, Yonatan Bisk, Henny Admoni, Chris Paxton,
Abstract summary: We present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference. We collect a large dataset of naturalistic personal preferences in a simulated table-setting task. We find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.
Score: 53.86523017756224
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: De gustibus non est disputandum ("there is no accounting for others' tastes") is a common Latin maxim describing how many solutions in life are determined by people's personal preferences. Many household tasks, in particular, can only be considered fully successful when they account for personal preferences such as the visual aesthetic of the scene. For example, setting a table could be optimized by arranging utensils according to traditional rules of Western table setting decorum, without considering the color, shape, or material of each object, but this may not be a completely satisfying solution for a given person. Toward this end, we present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference. To do this, we use internet-scale pre-trained vision-and-language foundation models (VLMs) with novel zero-shot visual prompting techniques. To evaluate our method, we collect a large dataset of naturalistic personal preferences in a simulated table-setting task, and conduct a user study in order to develop two novel metrics for determining success based on personal preference. This is a challenging problem and we find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.

Related papers

Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment [21.677859755364334]
Persona-judge is a novel discriminative paradigm that enables training-free personalized alignment with unseen preferences. We show that Persona-judge offers a scalable and computationally efficient solution to personalized alignment.
arXiv Detail & Related papers (2025-04-17T05:50:13Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models. We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization [37.66059382315255]
We present a unique approach that leverages readily available databases for general image aesthetic assessment and image quality assessment. By determining optimal combinations of task vectors, known to represent specific traits of each database, we successfully create personalized models for individuals.
arXiv Detail & Related papers (2024-07-09T18:42:41Z)
Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences. In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z)
Everyone Deserves A Reward: Learning Customized Human Preferences [25.28261194665836]
Reward models (RMs) are essential for aligning large language models with human preferences to improve interaction quality. We propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set. We find several ways to better preserve the general preferring ability while training the customized RMs.
arXiv Detail & Related papers (2023-09-06T16:03:59Z)
TidyBot: Personalized Robot Assistance with Large Language Models [46.629932362863386]
A key challenge is determining the proper place to put each object. One person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models.
arXiv Detail & Related papers (2023-05-09T17:52:59Z)
Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback [76.7007545844273]
We propose a multi-objective decision making framework that accommodates different user preferences over objectives. Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector. We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
arXiv Detail & Related papers (2023-02-07T23:58:19Z)
One for All: Simultaneous Metric and Preference Learning over Multiple Users [17.083305162005136]
We study the simultaneous preference and metric learning from a crowd of respondents. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities. We demonstrate the performance of our model on both simulated data and on a dataset of color preference judgements.
arXiv Detail & Related papers (2022-07-07T22:47:13Z)
My House, My Rules: Learning Tidying Preferences with Graph Neural Networks [8.57914821832517]
We present NeatNet: a novel Variational Autoencoder architecture using Graph Neural Network layers. We extract a low-dimensional latent preference vector from a user by observing how they arrange scenes. Given any set of objects, this vector can then be used to generate an arrangement which is tailored to that user's spatial preferences.
arXiv Detail & Related papers (2021-11-04T19:17:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.