Constrained Reinforcement Learning for Short Video Recommendation
- URL: http://arxiv.org/abs/2205.13248v1
- Date: Thu, 26 May 2022 09:36:20 GMT
- Title: Constrained Reinforcement Learning for Short Video Recommendation
- Authors: Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding,
Pinghua Gong, Dong Zheng, Peng Jiang
- Abstract summary: Short videos on social media platforms pose new challenges to optimize recommender systems.
We propose a two-stage reinforcement learning approach based on actor-critic framework.
Our approach has been fully launched in the production system to optimize user experiences.
- Score: 18.492477839791274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The wide popularity of short videos on social media poses new opportunities
and challenges to optimize recommender systems on the video-sharing platforms.
Users provide complex and multi-faceted responses towards recommendations,
including watch time and various types of interactions with videos. As a
result, established recommendation algorithms that concern a single objective
are not adequate to meet this new demand of optimizing comprehensive user
experiences. In this paper, we formulate the problem of short video
recommendation as a constrained Markov Decision Process (MDP), where platforms
want to optimize the main goal of user watch time in long term, with the
constraint of accommodating the auxiliary responses of user interactions such
as sharing/downloading videos.
To solve the constrained MDP, we propose a two-stage reinforcement learning
approach based on actor-critic framework. At stage one, we learn individual
policies to optimize each auxiliary response. At stage two, we learn a policy
to (i) optimize the main response and (ii) stay close to policies learned at
the first stage, which effectively guarantees the performance of this main
policy on the auxiliaries. Through extensive simulations, we demonstrate
effectiveness of our approach over alternatives in both optimizing the main
goal as well as balancing the others. We further show the advantage of our
approach in live experiments of short video recommendations, where it
significantly outperforms other baselines in terms of watch time and
interactions from video views. Our approach has been fully launched in the
production system to optimize user experiences on the platform.
Related papers
- Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - A Model-based Multi-Agent Personalized Short-Video Recommender System [19.03089585214444]
We propose a RL-based industrial short-video recommender ranking framework.
Our proposed framework adopts a model-based learning approach to alleviate the sample selection bias.
Our proposed approach has been deployed in our real large-scale short-video sharing platform.
arXiv Detail & Related papers (2024-05-03T04:34:36Z) - A Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation [77.42486522565295]
We propose a novel recommendation approach called LSVCR to jointly conduct personalized video and comment recommendation.
Our approach consists of two key components, namely sequential recommendation (SR) model and supplemental large language model (LLM) recommender.
In particular, we achieve a significant overall gain of 4.13% in comment watch time.
arXiv Detail & Related papers (2024-03-20T13:14:29Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from
Imperfect Demonstration for Interactive Recommendation [23.048841953423846]
We focus on the problem of learning to reward, which is fundamental to reinforcement learning.
Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization.
We propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties.
arXiv Detail & Related papers (2023-10-30T13:43:20Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Two-Stage Constrained Actor-Critic for Short Video Recommendation [23.12631658373264]
We formulate the problem of short video recommendation as a Constrained Markov Decision Process (CMDP)
We propose a novel two-stage constrained actor-critic method to optimize each auxiliary signal.
Our method significantly outperforms other baselines in terms of both watch time and interactions.
arXiv Detail & Related papers (2023-02-03T12:02:54Z) - Offline Meta-level Model-based Reinforcement Learning Approach for
Cold-Start Recommendation [27.17948754183511]
Reinforcement learning has shown great promise in optimizing long-term user interest in recommender systems.
Existing RL-based recommendation methods need a large number of interactions for each user to learn a robust recommendation policy.
We propose a meta-level model-based reinforcement learning approach for fast user adaptation.
arXiv Detail & Related papers (2020-12-04T08:58:35Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z) - Delving into 3D Action Anticipation from Streaming Videos [99.0155538452263]
Action anticipation aims to recognize the action with a partial observation.
We introduce several complementary evaluation metrics and present a basic model based on frame-wise action classification.
We also explore multi-task learning strategies by incorporating auxiliary information from two aspects: the full action representation and the class-agnostic action label.
arXiv Detail & Related papers (2019-06-15T10:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.