Constrained Reinforcement Learning for Short Video Recommendation
- URL: http://arxiv.org/abs/2205.13248v1
- Date: Thu, 26 May 2022 09:36:20 GMT
- Title: Constrained Reinforcement Learning for Short Video Recommendation
- Authors: Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding,
Pinghua Gong, Dong Zheng, Peng Jiang
- Abstract summary: Short videos on social media platforms pose new challenges to optimize recommender systems.
We propose a two-stage reinforcement learning approach based on actor-critic framework.
Our approach has been fully launched in the production system to optimize user experiences.
- Score: 18.492477839791274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The wide popularity of short videos on social media poses new opportunities
and challenges to optimize recommender systems on the video-sharing platforms.
Users provide complex and multi-faceted responses towards recommendations,
including watch time and various types of interactions with videos. As a
result, established recommendation algorithms that concern a single objective
are not adequate to meet this new demand of optimizing comprehensive user
experiences. In this paper, we formulate the problem of short video
recommendation as a constrained Markov Decision Process (MDP), where platforms
want to optimize the main goal of user watch time in long term, with the
constraint of accommodating the auxiliary responses of user interactions such
as sharing/downloading videos.
To solve the constrained MDP, we propose a two-stage reinforcement learning
approach based on actor-critic framework. At stage one, we learn individual
policies to optimize each auxiliary response. At stage two, we learn a policy
to (i) optimize the main response and (ii) stay close to policies learned at
the first stage, which effectively guarantees the performance of this main
policy on the auxiliaries. Through extensive simulations, we demonstrate
effectiveness of our approach over alternatives in both optimizing the main
goal as well as balancing the others. We further show the advantage of our
approach in live experiments of short video recommendations, where it
significantly outperforms other baselines in terms of watch time and
interactions from video views. Our approach has been fully launched in the
production system to optimize user experiences on the platform.
Related papers
- Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions.
For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z) - Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM [54.2320450886902]
Text-to-video models have made remarkable advancements through optimization on high-quality text-video pairs.
Current automatic methods for refining prompts encounter challenges such as Modality-Inconsistency, Cost-Discrepancy, and Model-Unaware.
We introduce Prompt-A-Video, which excels in crafting Video-Centric, Labor-Free and Preference-Aligned prompts tailored to specific video diffusion model.
arXiv Detail & Related papers (2024-12-19T18:32:21Z) - A Model-based Multi-Agent Personalized Short-Video Recommender System [19.03089585214444]
We propose a RL-based industrial short-video recommender ranking framework.
Our proposed framework adopts a model-based learning approach to alleviate the sample selection bias.
Our proposed approach has been deployed in our real large-scale short-video sharing platform.
arXiv Detail & Related papers (2024-05-03T04:34:36Z) - User Welfare Optimization in Recommender Systems with Competing Content Creators [65.25721571688369]
In this study, we perform system-side user welfare optimization under a competitive game setting among content creators.
We propose an algorithmic solution for the platform, which dynamically computes a sequence of weights for each user based on their satisfaction of the recommended content.
These weights are then utilized to design mechanisms that adjust the recommendation policy or the post-recommendation rewards, thereby influencing creators' content production strategies.
arXiv Detail & Related papers (2024-04-28T21:09:52Z) - A Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation [77.42486522565295]
We propose a novel recommendation approach called LSVCR to jointly conduct personalized video and comment recommendation.
Our approach consists of two key components, namely sequential recommendation (SR) model and supplemental large language model (LLM) recommender.
In particular, we achieve a significant overall gain of 4.13% in comment watch time.
arXiv Detail & Related papers (2024-03-20T13:14:29Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from
Imperfect Demonstration for Interactive Recommendation [23.048841953423846]
We focus on the problem of learning to reward, which is fundamental to reinforcement learning.
Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization.
We propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties.
arXiv Detail & Related papers (2023-10-30T13:43:20Z) - Two-Stage Constrained Actor-Critic for Short Video Recommendation [23.12631658373264]
We formulate the problem of short video recommendation as a Constrained Markov Decision Process (CMDP)
We propose a novel two-stage constrained actor-critic method to optimize each auxiliary signal.
Our method significantly outperforms other baselines in terms of both watch time and interactions.
arXiv Detail & Related papers (2023-02-03T12:02:54Z) - Offline Meta-level Model-based Reinforcement Learning Approach for
Cold-Start Recommendation [27.17948754183511]
Reinforcement learning has shown great promise in optimizing long-term user interest in recommender systems.
Existing RL-based recommendation methods need a large number of interactions for each user to learn a robust recommendation policy.
We propose a meta-level model-based reinforcement learning approach for fast user adaptation.
arXiv Detail & Related papers (2020-12-04T08:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.