Related papers: Personal Comfort Estimation in Partial Observable Environment using Reinforcement Learning

Personal Comfort Estimation in Partial Observable Environment using Reinforcement Learning

URL: http://arxiv.org/abs/2112.00971v2
Date: Fri, 3 Dec 2021 01:42:45 GMT
Title: Personal Comfort Estimation in Partial Observable Environment using Reinforcement Learning
Authors: Shashi Suman, Ali Etemad, Francois Rivest
Abstract summary: Most smart homes learn a uniform model to represent the thermal preference of user. Having different thermal sensation for each user poses a challenge for the smart homes to learn a personalized preference for each occupant. A smart home with single optimal policy may fail to provide comfort when a new user with different preference is integrated in the home.
Score: 8.422257363944295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The technology used in smart homes have improved to learn the user preferences from feedbacks in order to provide convenience to the user in the home environment. Most smart homes learn a uniform model to represent the thermal preference of user which generally fails when the pool of occupants includes people having different age, gender, and location. Having different thermal sensation for each user poses a challenge for the smart homes to learn a personalized preference for each occupant without forgetting the policy of others. A smart home with single optimal policy may fail to provide comfort when a new user with different preference is integrated in the home. In this paper, we propose POSHS, a Bayesian Reinforcement learning algorithm that can approximate the current occupant state in a partial observable environment using its thermal preference and then decide if its a new occupant or belongs to the pool of previously observed users. We then compare POSHS algorithm with an LSTM based algorithm to learn and estimate the current state of the occupant while also taking optimal actions to reduce the timesteps required to set the preferences. We perform these experiments with upto 5 simulated human models each based on hierarchical reinforcement learning. The results show that POSHS can approximate the current user state just from its temperature and humidity preference and also reduce the number of time-steps required to set optimal temperature and humidity by the human model in the presence of the smart home.

Related papers

ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models. We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z)
DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement [53.86523017756224]
We present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference. We collect a large dataset of naturalistic personal preferences in a simulated table-setting task. We find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.
arXiv Detail & Related papers (2024-07-11T21:28:02Z)
Contrastive Preference Learning: Learning from Human Feedback without RL [71.77024922527642]
We introduce Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs.
arXiv Detail & Related papers (2023-10-20T16:37:56Z)
Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback [76.7007545844273]
We propose a multi-objective decision making framework that accommodates different user preferences over objectives. Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector. We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
arXiv Detail & Related papers (2023-02-07T23:58:19Z)
Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z)
RESUS: Warm-Up Cold Users via Meta-Learning Residual User Preferences in CTR Prediction [14.807495564177252]
Click-Through Rate (CTR) prediction on cold users is a challenging task in recommender systems. We propose a novel and efficient approach named RESUS, which decouples the learning of global preference knowledge contributed by collective users from the learning of residual preferences for individual users. Our approach is efficient and effective in improving CTR prediction accuracy on cold users, compared with various state-of-the-art methods.
arXiv Detail & Related papers (2022-10-28T11:57:58Z)
Cohort comfort models -- Using occupants' similarity to predict personal thermal preference with less data [0.0]
We introduce Cohort Comfort Models, a new framework for predicting how new occupants would perceive their thermal environment. Our framework is capable of exploiting available background information such as physical characteristics and one-time on-boarding surveys.
arXiv Detail & Related papers (2022-08-05T10:21:03Z)
Modeling Dynamic User Preference via Dictionary Learning for Sequential Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time. Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently. This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z)
Targeting occupant feedback using digital twins: Adaptive spatial-temporal thermal preference sampling to optimize personal comfort models [0.0]
This paper outlines a scenario-based (virtual experiment) method for optimizing data sampling using a smartwatch to achieve comparable accuracy in a personal thermal preference model with less data. The proposed Build2Vec method is 18-23% more in the overall sampling quality than the spaces-based and the square-grid-based sampling methods.
arXiv Detail & Related papers (2022-02-22T07:38:23Z)
Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec [0.0]
This research aims to build upon an existing vector-based spatial model, called Build2Vec, for predicting indoor environmental preferences. The framework uses longitudinal intensive thermal comfort subjective feedback from smart watch-based ecological momentary assessments (EMA) The results of a test implementation show 14-28% accuracy improvement over a set of baselines that use conventional thermal preference prediction input variables.
arXiv Detail & Related papers (2021-10-30T07:43:11Z)
Learning User Preferences in Non-Stationary Environments [42.785926822853746]
We introduce a novel model for online non-stationary recommendation systems. We show that our algorithm outperforms other static algorithms even when preferences do not change over time.
arXiv Detail & Related papers (2021-01-29T10:26:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.