CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias
- URL: http://arxiv.org/abs/2407.07454v2
- Date: Tue, 16 Jul 2024 04:29:04 GMT
- Title: CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias
- Authors: Jiacheng Shen, Lihan Feng,
- Abstract summary: We propose a new algorithm in Deep Reinforcement Learning, CM-DQN, to simulate the human decision-making process.
We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In human decision-making tasks, individuals learn through trials and prediction errors. When individuals learn the task, some are more influenced by good outcomes, while others weigh bad outcomes more heavily. Such confirmation bias can lead to different learning effects. In this study, we propose a new algorithm in Deep Reinforcement Learning, CM-DQN, which applies the idea of different update strategies for positive or negative prediction errors, to simulate the human decision-making process when the task's states are continuous while the actions are discrete. We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects. Moreover, we apply the confirmation model in a multi-armed bandit problem (environment in discrete states and discrete actions), which utilizes the same idea as our proposed algorithm, as a contrast experiment to algorithmically simulate the impact of different confirmation bias in decision-making process. In both experiments, confirmatory bias indicates a better learning effect. Our code can be found here https://github.com/Patrickhshs/CM-DQN.
Related papers
- Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator [0.0]
Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift.
Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards.
arXiv Detail & Related papers (2024-01-30T06:22:19Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness.
The more disparate the data distributions across clients the more they benefit from learning.
We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z) - Increasing Students' Engagement to Reminder Emails Through Multi-Armed
Bandits [60.4933541247257]
This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits.
Using Multi-Armed Bandits (MAB) algorithms in adaptive experiments can increase students' chances of obtaining better outcomes.
We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference.
arXiv Detail & Related papers (2022-08-10T00:30:52Z) - How trial-to-trial learning shapes mappings in the mental lexicon:
Modelling Lexical Decision with Linear Discriminative Learning [0.4450536872346657]
This study investigates whether trial-to-trial learning can be detected in an unprimed lexical decision experiment.
We used the Discriminative Lexicon Model (DLM), a model of the mental lexicon with meaning representations from distributional semantics.
Our results support the possibility that our lexical knowledge is subject to continuous changes.
arXiv Detail & Related papers (2022-07-01T13:49:30Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Are Bias Mitigation Techniques for Deep Learning Effective? [24.84797949716142]
We introduce an improved evaluation protocol, sensible metrics, and a new dataset.
We evaluate seven state-of-the-art algorithms using the same network architecture.
We find that algorithms exploit hidden biases, are unable to scale to multiple forms of bias, and are highly sensitive to the choice of tuning set.
arXiv Detail & Related papers (2021-04-01T00:14:45Z) - A framework for predicting, interpreting, and improving Learning
Outcomes [0.0]
We develop an Embibe Score Quotient model (ESQ) to predict test scores based on observed academic, behavioral and test-taking features of a student.
ESQ can be used to predict the future scoring potential of a student as well as offer personalized learning nudges.
arXiv Detail & Related papers (2020-10-06T11:22:27Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z) - A New Framework for Query Efficient Active Imitation Learning [5.167794607251493]
There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive.
We propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries.
We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games.
arXiv Detail & Related papers (2019-12-30T18:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.