Increasing Students' Engagement to Reminder Emails Through Multi-Armed
Bandits
- URL: http://arxiv.org/abs/2208.05090v1
- Date: Wed, 10 Aug 2022 00:30:52 GMT
- Title: Increasing Students' Engagement to Reminder Emails Through Multi-Armed
Bandits
- Authors: Fernando J. Yanez, Angela Zavaleta-Bernuy, Ziwen Han, Michael Liut,
Anna Rafferty, Joseph Jay Williams
- Abstract summary: This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits.
Using Multi-Armed Bandits (MAB) algorithms in adaptive experiments can increase students' chances of obtaining better outcomes.
We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference.
- Score: 60.4933541247257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conducting randomized experiments in education settings raises the question
of how we can use machine learning techniques to improve educational
interventions. Using Multi-Armed Bandits (MAB) algorithms like Thompson
Sampling (TS) in adaptive experiments can increase students' chances of
obtaining better outcomes by increasing the probability of assignment to the
most optimal condition (arm), even before an intervention completes. This is an
advantage over traditional A/B testing, which may allocate an equal number of
students to both optimal and non-optimal conditions. The problem is the
exploration-exploitation trade-off. Even though adaptive policies aim to
collect enough information to allocate more students to better arms reliably,
past work shows that this may not be enough exploration to draw reliable
conclusions about whether arms differ. Hence, it is of interest to provide
additional uniform random (UR) exploration throughout the experiment. This
paper shows a real-world adaptive experiment on how students engage with
instructors' weekly email reminders to build their time management habits. Our
metric of interest is open email rates which tracks the arms represented by
different subject lines. These are delivered following different allocation
algorithms: UR, TS, and what we identified as TS{\dag} - which combines both TS
and UR rewards to update its priors. We highlight problems with these adaptive
algorithms - such as possible exploitation of an arm when there is no
significant difference - and address their causes and consequences. Future
directions includes studying situations where the early choice of the optimal
arm is not ideal and how adaptive algorithms can address them.
Related papers
- Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs.
In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time.
We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Contextual Bandits in a Survey Experiment on Charitable Giving:
Within-Experiment Outcomes versus Policy Learning [21.9468085255912]
We design and implement an adaptive experiment (a contextual bandit'') to learn a targeted treatment assignment policy.
The goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation.
We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study.
arXiv Detail & Related papers (2022-11-22T04:44:17Z) - Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments.
We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z) - Challenges in Statistical Analysis of Data Collected by a Bandit
Algorithm: An Empirical Exploration in Applications to Adaptively Randomized
Experiments [11.464963616709671]
Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments.
We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes.
We show that collecting data with TS can as much as double the False Positive Rate (FPR) and the False Negative Rate (FNR)
arXiv Detail & Related papers (2021-03-22T22:05:18Z) - Continuous Mean-Covariance Bandits [39.820490484375156]
We propose a novel Continuous Mean-Covariance Bandit model to take into account option correlation.
In CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions.
We propose novel algorithms with optimal regrets (within logarithmic factors) and provide matching lower bounds to validate their optimalities.
arXiv Detail & Related papers (2021-02-24T06:37:05Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.