Related papers: CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

URL: http://arxiv.org/abs/2407.07454v2
Date: Tue, 16 Jul 2024 04:29:04 GMT
Title: CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias
Authors: Jiacheng Shen, Lihan Feng,
Abstract summary: We propose a new algorithm in Deep Reinforcement Learning, CM-DQN, to simulate the human decision-making process. We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In human decision-making tasks, individuals learn through trials and prediction errors. When individuals learn the task, some are more influenced by good outcomes, while others weigh bad outcomes more heavily. Such confirmation bias can lead to different learning effects. In this study, we propose a new algorithm in Deep Reinforcement Learning, CM-DQN, which applies the idea of different update strategies for positive or negative prediction errors, to simulate the human decision-making process when the task's states are continuous while the actions are discrete. We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects. Moreover, we apply the confirmation model in a multi-armed bandit problem (environment in discrete states and discrete actions), which utilizes the same idea as our proposed algorithm, as a contrast experiment to algorithmically simulate the impact of different confirmation bias in decision-making process. In both experiments, confirmatory bias indicates a better learning effect. Our code can be found here https://github.com/Patrickhshs/CM-DQN.

Related papers

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs [51.00909549291524]
Large language models (LLMs) exhibit cognitive biases.<n>These biases vary across models and can be amplified by instruction tuning.<n>It remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise.
arXiv Detail & Related papers (2025-07-09T18:01:14Z)
Bias or Optimality? Disentangling Bayesian Inference and Learning Biases in Human Decision-Making [0.0]
We find that even if an agent updates its belief via objective Bayesian inference, fitting the standard Q-learning model with asymmetric learning rates still recovers both biases.<n>We explain this by analyzing the dynamics of these learning systems using master equations.
arXiv Detail & Related papers (2025-05-12T20:36:43Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
Sample-efficient Adversarial Imitation Learning [45.400080101596956]
We propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations. We show a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs.
arXiv Detail & Related papers (2023-03-14T12:36:01Z)
How trial-to-trial learning shapes mappings in the mental lexicon: Modelling Lexical Decision with Linear Discriminative Learning [0.4450536872346657]
This study investigates whether trial-to-trial learning can be detected in an unprimed lexical decision experiment. We used the Discriminative Lexicon Model (DLM), a model of the mental lexicon with meaning representations from distributional semantics. Our results support the possibility that our lexical knowledge is subject to continuous changes.
arXiv Detail & Related papers (2022-07-01T13:49:30Z)
Characterizing the robustness of Bayesian adaptive experimental designs to active learning bias [3.1351527202068445]
We show that active learning bias can afflict Bayesian adaptive experimental design, depending on model misspecification. We develop an information-theoretic measure of misspecification, and show that worse misspecification implies more severe active learning bias.
arXiv Detail & Related papers (2022-05-27T01:23:11Z)
Agree to Disagree: Diversity through Disagreement for Better Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data. We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
A framework for predicting, interpreting, and improving Learning Outcomes [0.0]
We develop an Embibe Score Quotient model (ESQ) to predict test scores based on observed academic, behavioral and test-taking features of a student. ESQ can be used to predict the future scoring potential of a student as well as offer personalized learning nudges.
arXiv Detail & Related papers (2020-10-06T11:22:27Z)
Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential. We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes. We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z)
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds. We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.