Sequential Search with Off-Policy Reinforcement Learning
- URL: http://arxiv.org/abs/2202.00245v1
- Date: Tue, 1 Feb 2022 06:52:40 GMT
- Title: Sequential Search with Off-Policy Reinforcement Learning
- Authors: Dadong Miao, Yanan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yun
Xiao, Lingfei Wu, Yunjiang Jiang
- Abstract summary: We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model.
As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly.
We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
- Score: 48.88165680363482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen a significant amount of interests in Sequential
Recommendation (SR), which aims to understand and model the sequential user
behaviors and the interactions between users and items over time. Surprisingly,
despite the huge success Sequential Recommendation has achieved, there is
little study on Sequential Search (SS), a twin learning task that takes into
account a user's current and past search queries, in addition to behavior on
historical query sessions. The SS learning task is even more important than the
counterpart SR task for most of E-commence companies due to its much larger
online serving demands as well as traffic volume.
To this end, we propose a highly scalable hybrid learning model that consists
of an RNN learning framework leveraging all features in short-term user-item
interactions, and an attention model utilizing selected item-only features from
long-term interactions. As a novel optimization step, we fit multiple short
user sequences in a single RNN pass within a training batch, by solving a
greedy knapsack problem on the fly. Moreover, we explore the use of off-policy
reinforcement learning in multi-session personalized search ranking.
Specifically, we design a pairwise Deep Deterministic Policy Gradient model
that efficiently captures users' long term reward in terms of pairwise
classification error. Extensive ablation experiments demonstrate significant
improvement each component brings to its state-of-the-art baseline, on a
variety of offline and online metrics.
Related papers
- Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction [68.90783662117936]
Click-through Rate (CTR) prediction is crucial for online personalization platforms.
Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction.
We propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN)
arXiv Detail & Related papers (2024-11-22T15:29:05Z) - SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential Recommendation [16.370075234443245]
We propose a unified lifelong multi-modal sequence model called SEMINAR-Search Enhanced Multi-Modal Interest Network and Approximate Retrieval.
Specifically, a network called Pretraining Search Unit learns the lifelong sequences of multi-modal query-item pairs in a pretraining-finetuning manner.
To accelerate the online retrieval speed of multi-modal embedding, we propose a multi-modal codebook-based product quantization strategy.
arXiv Detail & Related papers (2024-07-15T13:33:30Z) - SA-LSPL:Sequence-Aware Long- and Short- Term Preference Learning for next POI recommendation [19.40796508546581]
Point of Interest (POI) recommendation aims to recommend the POI for users at a specific time.
We propose a novel approach called Sequence-Aware Long- and Short-Term Preference Learning (SA-LSPL) for next-POI recommendation.
arXiv Detail & Related papers (2024-03-30T13:40:25Z) - Multi-Behavior Sequential Recommendation with Temporal Graph Transformer [66.10169268762014]
We tackle the dynamic user-item relation learning with the awareness of multi-behavior interactive patterns.
We propose a new Temporal Graph Transformer (TGT) recommendation framework to jointly capture dynamic short-term and long-range user-item interactive patterns.
arXiv Detail & Related papers (2022-06-06T15:42:54Z) - Boosting the Learning for Ranking Patterns [6.142272540492935]
This paper formulates the problem of learning pattern ranking functions as a multi-criteria decision making problem.
Our approach aggregates different interestingness measures into a single weighted linear ranking function, using an interactive learning procedure.
Experiments conducted on well-known datasets show that our approach significantly reduces the running time and returns precise pattern ranking.
arXiv Detail & Related papers (2022-03-05T10:22:44Z) - Hyper Meta-Path Contrastive Learning for Multi-Behavior Recommendation [61.114580368455236]
User purchasing prediction with multi-behavior information remains a challenging problem for current recommendation systems.
We propose the concept of hyper meta-path to construct hyper meta-paths or hyper meta-graphs to explicitly illustrate the dependencies among different behaviors of a user.
Thanks to the recent success of graph contrastive learning, we leverage it to learn embeddings of user behavior patterns adaptively instead of assigning a fixed scheme to understand the dependencies among different behaviors.
arXiv Detail & Related papers (2021-09-07T04:28:09Z) - Sequence Adaptation via Reinforcement Learning in Recommender Systems [8.909115457491522]
We propose the SAR model, which learns the sequential patterns and adjusts the sequence length of user-item interactions in a personalized manner.
In addition, we optimize a joint loss function to align the accuracy of the sequential recommendations with the expected cumulative rewards of the critic network.
Our experimental evaluation on four real-world datasets demonstrates the superiority of our proposed model over several baseline approaches.
arXiv Detail & Related papers (2021-07-31T13:56:46Z) - Dynamic Memory based Attention Network for Sequential Recommendation [79.5901228623551]
We propose a novel long sequential recommendation model called Dynamic Memory-based Attention Network (DMAN)
It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve long-term interests of users.
Based on the dynamic memory, the user's short-term and long-term interests can be explicitly extracted and combined for efficient joint recommendation.
arXiv Detail & Related papers (2021-02-18T11:08:54Z) - Dynamic Embeddings for Interaction Prediction [2.5758502140236024]
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention.
Recent studies have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings.
We propose a novel method called DeePRed that addresses some of their limitations.
arXiv Detail & Related papers (2020-11-10T16:04:46Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.