Two-Stage Constrained Actor-Critic for Short Video Recommendation
- URL: http://arxiv.org/abs/2302.01680v3
- Date: Tue, 9 Jan 2024 09:59:23 GMT
- Title: Two-Stage Constrained Actor-Critic for Short Video Recommendation
- Authors: Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan
Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang, Kun Gai
- Abstract summary: We formulate the problem of short video recommendation as a Constrained Markov Decision Process (CMDP)
We propose a novel two-stage constrained actor-critic method to optimize each auxiliary signal.
Our method significantly outperforms other baselines in terms of both watch time and interactions.
- Score: 23.12631658373264
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The wide popularity of short videos on social media poses new opportunities
and challenges to optimize recommender systems on the video-sharing platforms.
Users sequentially interact with the system and provide complex and
multi-faceted responses, including watch time and various types of interactions
with multiple videos. One the one hand, the platforms aims at optimizing the
users' cumulative watch time (main goal) in long term, which can be effectively
optimized by Reinforcement Learning. On the other hand, the platforms also
needs to satisfy the constraint of accommodating the responses of multiple user
interactions (auxiliary goals) such like, follow, share etc. In this paper, we
formulate the problem of short video recommendation as a Constrained Markov
Decision Process (CMDP). We find that traditional constrained reinforcement
learning algorithms can not work well in this setting. We propose a novel
two-stage constrained actor-critic method: At stage one, we learn individual
policies to optimize each auxiliary signal. At stage two, we learn a policy to
(i) optimize the main signal and (ii) stay close to policies learned at the
first stage, which effectively guarantees the performance of this main policy
on the auxiliaries. Through extensive offline evaluations, we demonstrate
effectiveness of our method over alternatives in both optimizing the main goal
as well as balancing the others. We further show the advantage of our method in
live experiments of short video recommendations, where it significantly
outperforms other baselines in terms of both watch time and interactions. Our
approach has been fully launched in the production system to optimize user
experiences on the platform.
Related papers
- Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - A Model-based Multi-Agent Personalized Short-Video Recommender System [19.03089585214444]
We propose a RL-based industrial short-video recommender ranking framework.
Our proposed framework adopts a model-based learning approach to alleviate the sample selection bias.
Our proposed approach has been deployed in our real large-scale short-video sharing platform.
arXiv Detail & Related papers (2024-05-03T04:34:36Z) - PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization [7.682021482980697]
We introduce a novel approach to temporal action localization (TAL) in few-shot learning.
We propose a multi-prompt learning framework enhanced with optimal transport.
Our experiments demonstrate significant improvements in action localization accuracy and robustness in few-shot settings.
arXiv Detail & Related papers (2024-03-27T18:08:14Z) - A Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation [77.42486522565295]
We propose a novel recommendation approach called LSVCR to jointly conduct personalized video and comment recommendation.
Our approach consists of two key components, namely sequential recommendation (SR) model and supplemental large language model (LLM) recommender.
In particular, we achieve a significant overall gain of 4.13% in comment watch time.
arXiv Detail & Related papers (2024-03-20T13:14:29Z) - Constrained Reinforcement Learning for Short Video Recommendation [18.492477839791274]
Short videos on social media platforms pose new challenges to optimize recommender systems.
We propose a two-stage reinforcement learning approach based on actor-critic framework.
Our approach has been fully launched in the production system to optimize user experiences.
arXiv Detail & Related papers (2022-05-26T09:36:20Z) - On component interactions in two-stage recommender systems [82.38014314502861]
Two-stage recommenders are used by many online platforms, including YouTube, LinkedIn, and Pinterest.
We show that interactions between the ranker and the nominators substantially affect the overall performance.
In particular, using a Mixture-of-Experts approach, we train the nominators to specialize on different subsets of the item pool.
arXiv Detail & Related papers (2021-06-28T20:53:23Z) - Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds.
We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z) - Scene-Adaptive Video Frame Interpolation via Meta-Learning [54.87696619177496]
We propose to adapt the model to each video by making use of additional information that is readily available at test time.
We obtain significant performance gains with only a single gradient update without any additional parameters.
arXiv Detail & Related papers (2020-04-02T02:46:44Z) - Delving into 3D Action Anticipation from Streaming Videos [99.0155538452263]
Action anticipation aims to recognize the action with a partial observation.
We introduce several complementary evaluation metrics and present a basic model based on frame-wise action classification.
We also explore multi-task learning strategies by incorporating auxiliary information from two aspects: the full action representation and the class-agnostic action label.
arXiv Detail & Related papers (2019-06-15T10:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.