DRDT: Dynamic Reflection with Divergent Thinking for LLM-based
Sequential Recommendation
- URL: http://arxiv.org/abs/2312.11336v1
- Date: Mon, 18 Dec 2023 16:41:22 GMT
- Title: DRDT: Dynamic Reflection with Divergent Thinking for LLM-based
Sequential Recommendation
- Authors: Yu Wang, Zhiwei Liu, Jianguo Zhang, Weiran Yao, Shelby Heinecke,
Philip S. Yu
- Abstract summary: We introduce a novel reasoning principle: Dynamic Reflection with Divergent Thinking.
Our methodology is dynamic reflection, a process that emulates human learning through probing, critiquing, and reflecting.
We evaluate our approach on three datasets using six pre-trained LLMs.
- Score: 53.62727171363384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of Large Language Models (LLMs) has sparked interest in their
application to sequential recommendation tasks as they can provide supportive
item information. However, due to the inherent complexities of sequential
recommendation, such as sequential patterns across datasets, noise within
sequences, and the temporal evolution of user preferences, existing LLM
reasoning strategies, such as in-context learning and chain-of-thought are not
fully effective. To address these challenges, we introduce a novel reasoning
principle: Dynamic Reflection with Divergent Thinking within a
retriever-reranker framework. Our approach starts with a collaborative
in-context demonstration retriever, which collects sequences exhibiting
collaborative behaviors as in-context examples. Following this, we abstract
high-level user preferences across multiple aspects, providing a more nuanced
understanding of user interests and circumventing the noise within the raw
sequences. The cornerstone of our methodology is dynamic reflection, a process
that emulates human learning through probing, critiquing, and reflecting, using
user feedback to tailor the analysis more effectively to the target user in a
temporal manner. We evaluate our approach on three datasets using six
pre-trained LLMs. The superior performance observed across these models
demonstrates the efficacy of our reasoning strategy, notably achieved without
the need to fine-tune the LLMs. With our principle, we managed to outperform
GPT-Turbo-3.5 on three datasets using 7b models e.g., Vicuna-7b and Openchat-7b
on NDCG@10. This research not only highlights the potential of LLMs in
enhancing sequential recommendation systems but also underscores the importance
of developing tailored reasoning strategies to fully harness their
capabilities.
Related papers
- Enhancing Sequential Recommendations through Multi-Perspective Reflections and Iteration [16.10791252542592]
Sequence recommendation (SeqRec) aims to predict the next item a user will interact with by understanding user intentions and leveraging collaborative filtering information.
Large language models (LLMs) have shown great promise in recommendation tasks through prompt-based, fixed reflection libraries, and fine-tuning techniques.
MoRE introduces three reflectors for generating LLM-based reflections on explicit preferences, implicit preferences, and collaborative signals.
arXiv Detail & Related papers (2024-09-10T09:58:55Z) - Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting [23.61061000692023]
This study proposes leveraging user interactions recorded in search logs to yield insights into users' implicit search intentions.
We propose ProRBP, a novel Progressive Retrieved Behavior-augmented Prompting framework for integrating search scenario-oriented knowledge with Large Language Models.
arXiv Detail & Related papers (2024-08-18T11:07:38Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential
Reasoning Ability [29.1826948551409]
AQA-Bench is a novel benchmark to assess the sequential reasoning capabilities of large language models.
We build AQA-Bench with three different algorithms, namely binary search, depth-first search, and breadth-first search.
Our investigations reveal several interesting findings.
arXiv Detail & Related papers (2024-02-14T18:59:33Z) - Representation Learning with Large Language Models for Recommendation [34.46344639742642]
We propose a model-agnostic framework RLMRec to enhance recommenders with large language models (LLMs)empowered representation learning.
RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals.
arXiv Detail & Related papers (2023-10-24T15:51:13Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Robust Reinforcement Learning Objectives for Sequential Recommender Systems [7.44049827436013]
We develop recommender systems that incorporate direct user feedback in the form of rewards, enhancing personalization for users.
employing RL algorithms presents challenges, including off-policy training, expansive action spaces, and the scarcity of datasets with sufficient reward signals.
We introduce an enhanced methodology aimed at providing a more effective solution to these challenges.
arXiv Detail & Related papers (2023-05-30T08:09:08Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.