Learning to Infer User Hidden States for Online Sequential Advertising
- URL: http://arxiv.org/abs/2009.01453v1
- Date: Thu, 3 Sep 2020 05:12:26 GMT
- Title: Learning to Infer User Hidden States for Online Sequential Advertising
- Authors: Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang,
Weinan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu,
Kun Gai
- Abstract summary: We propose our Deep Intents Sequential Advertising (DISA) method to address these issues.
The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states)
- Score: 52.169666997331724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To drive purchase in online advertising, it is of the advertiser's great
interest to optimize the sequential advertising strategy whose performance and
interpretability are both important. The lack of interpretability in existing
deep reinforcement learning methods makes it not easy to understand, diagnose
and further optimize the strategy. In this paper, we propose our Deep Intents
Sequential Advertising (DISA) method to address these issues. The key part of
interpretability is to understand a consumer's purchase intent which is,
however, unobservable (called hidden states). In this paper, we model this
intention as a latent variable and formulate the problem as a Partially
Observable Markov Decision Process (POMDP) where the underlying intents are
inferred based on the observable behaviors. Large-scale industrial offline and
online experiments demonstrate our method's superior performance over several
baselines. The inferred hidden states are analyzed, and the results prove the
rationality of our inference.
Related papers
- Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity [22.0059059325909]
We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information.
This setting arises in many application domains, such as self-driving cars, healthcare, and finance.
We propose the Experts-as-Priors algorithm (ExPerior) to establish an informative prior distribution over the learner's decision-making problem.
arXiv Detail & Related papers (2024-04-10T18:00:17Z) - Online Ad Procurement in Non-stationary Autobidding Worlds [10.871587311621974]
We introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints.
We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are adversarial, adversarially corrupted, periodic, and ergodic.
arXiv Detail & Related papers (2023-07-10T00:41:08Z) - Adversarial Constrained Bidding via Minimax Regret Optimization with
Causality-Aware Reinforcement Learning [18.408964908248855]
Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions.
We propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments.
Our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
arXiv Detail & Related papers (2023-06-12T13:31:58Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Targeted Advertising on Social Networks Using Online Variational Tensor
Regression [19.586412285513962]
We propose what we believe is the first contextual bandit framework for online targeted advertising.
The proposed framework is designed to accommodate any number of feature vectors in the form of multi-mode tensor.
We empirically confirm that the proposedUCB algorithm achieves a significant improvement in influence tasks over the benchmarks.
arXiv Detail & Related papers (2022-08-22T22:10:45Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using
Reinforcement Learning [0.0]
Reinforcement learning (RL) is an effective technique for training decision-making agents through interactions with their environment.
In digital advertising, real-time bidding (RTB) is a common method of allocating advertising inventory through real-time auctions.
arXiv Detail & Related papers (2021-05-21T21:56:12Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Inverse Active Sensing: Modeling and Understanding Timely
Decision-Making [111.07204912245841]
We develop a framework for the general setting of evidence-based decision-making under endogenous, context-dependent time pressure.
We demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies.
arXiv Detail & Related papers (2020-06-25T02:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.