Related papers: Generative Sequential Notification Optimization via Multi-Objective Decision Transformers

Generative Sequential Notification Optimization via Multi-Objective Decision Transformers

URL: http://arxiv.org/abs/2509.02458v1
Date: Tue, 02 Sep 2025 16:09:02 GMT
Title: Generative Sequential Notification Optimization via Multi-Objective Decision Transformers
Authors: Borja Ocejo, Ruofan Wang, Ke Liu, Rohit K. Patra, Haotian Shen, David Liu, Yiwen Yuan, Gokulraj Mohanasundaram, Fedor Borisyuk, Prakruthi Prabhakar,
Abstract summary: We present a Decision Transformer based framework that reframes policy learning as return-conditioned supervised learning.<n>Our contributions include a real-world comparison with CQL, a multi-reward design suitable for non-episodic tasks, and a quantile regression approach to return-to-go conditioning.
Score: 9.542285455613927
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Notifications are an important communication channel for delivering timely and relevant information. Optimizing their delivery involves addressing complex sequential decision-making challenges under constraints such as message utility and user fatigue. Offline reinforcement learning (RL) methods, such as Conservative Q-Learning (CQL), have been applied to this problem but face practical challenges at scale, including instability, sensitivity to distribution shifts, limited reproducibility, and difficulties with explainability in high-dimensional recommendation settings. We present a Decision Transformer (DT) based framework that reframes policy learning as return-conditioned supervised learning, improving robustness, scalability, and modeling flexibility. Our contributions include a real-world comparison with CQL, a multi-reward design suitable for non-episodic tasks, a quantile regression approach to return-to-go conditioning, and a production-ready system with circular buffer-based sequence processing for near-real-time inference. Extensive offline and online experiments in a deployed notification system show that our approach improves notification utility and overall session activity while minimizing user fatigue. Compared to a multi-objective CQL-based agent, the DT-based approach achieved a +0.72% increase in sessions for notification decision-making at LinkedIn by making notification recommendation more relevant.

Related papers

Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments [10.31577390735368]
This paper proposes an intelligent scheduling optimization framework based on deep Q-learning.<n>The framework formalizes the scheduling process as a Markov Decision Process.<n>It enables adaptive decision-making by a reinforcement learning agent in high-dimensional state spaces.
arXiv Detail & Related papers (2025-12-15T07:38:47Z)
TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration [64.32072516882947]
Diffusion Policy excels in embodied control but suffers from high inference latency and computational cost.<n>We propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP)<n>TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz.
arXiv Detail & Related papers (2025-12-13T07:53:14Z)
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training [47.26632817047513]
Reinforcement learning applied to large language models (LLMs) for reasoning tasks is often bottlenecked by unstable gradient estimates.<n>We propose Reinforce-Ada, an adaptive sampling framework for online RL post-training of LLMs.<n>Unlike conventional two-stage allocation methods, Reinforce-Ada interleaves estimation and sampling in an online successive elimination process.
arXiv Detail & Related papers (2025-10-06T16:34:09Z)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [65.18157595903124]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
Lightweight Task-Oriented Semantic Communication Empowered by Large-Scale AI Models [66.57755931421285]
Large-scale artificial intelligence (LAI) models pose significant challenges for real-time communication scenarios.<n>This paper proposes utilizing knowledge distillation (KD) techniques to extract and condense knowledge from LAI models.<n>We propose a fast distillation method featuring a pre-stored compression mechanism that eliminates the need for repetitive inference.
arXiv Detail & Related papers (2025-06-16T08:42:16Z)
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality [62.72381208029899]
In-context learning (ICL) enables language models to generate responses without modifying their parameters by leveraging examples provided in the input.<n>We propose Federated In-Context Learning (Fed-ICL), a general framework that enhances ICL through an iterative, collaborative process.<n>Fed-ICL progressively refines responses by leveraging multi-round interactions between clients and a central server, improving answer quality without the need to transmit model parameters.
arXiv Detail & Related papers (2025-06-09T05:33:28Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic [8.526578240549794]
In downlink transmission, an efficient power scheduling (EEPS) is essential for conserving power resource while delivering data packets within hard-latency.<n>Traditional algorithms show promise in EEPS but struggle with context non-stationary data constraints.<n>To overcome these challenges, this paper models overcome these challenges with a proposed context-aware constrained reinforcement learning algorithm.
arXiv Detail & Related papers (2025-03-12T13:37:19Z)
Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems [17.750449033873036]
Reinforcement Learning-based Recommender Systems (RLRS) have shown promise across a spectrum of applications. Yet, they grapple with challenges, notably in crafting reward functions and harnessing large pre-existing datasets. Recent advancements in offline RLRS provide a solution for how to address these two challenges.
arXiv Detail & Related papers (2024-03-26T12:08:58Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Self-Sustaining Multiple Access with Continual Deep Reinforcement Learning for Dynamic Metaverse Applications [17.436875530809946]
The Metaverse is a new paradigm that aims to create a virtual environment consisting of numerous worlds, each of which will offer a different set of services. To deal with such a dynamic and complex scenario, one potential approach is to adopt self-sustaining strategies. This paper investigates the problem of multiple access in multi-channel environments to maximize the throughput of the intelligent agent.
arXiv Detail & Related papers (2023-09-18T22:02:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.