End-to-end Training for Recommendation with Language-based User Profiles
- URL: http://arxiv.org/abs/2410.18870v2
- Date: Wed, 12 Feb 2025 21:37:54 GMT
- Title: End-to-end Training for Recommendation with Language-based User Profiles
- Authors: Zhaolin Gao, Joyce Zhou, Yijia Dai, Thorsten Joachims,
- Abstract summary: We introduce LangPTune, the first end-to-end training framework to optimize large language models (LLMs) generated user profiles.<n>Our method significantly outperforms zero-shot approaches by explicitly training the LLM for the recommendation objective.
- Score: 21.61482456379204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing interest in natural language-based user profiles for recommender systems, which aims to enhance transparency and scrutability compared with embedding-based methods. Existing studies primarily generate these profiles using zero-shot inference from large language models (LLMs), but their quality remains insufficient, leading to suboptimal recommendation performance. In this paper, we introduce LangPTune, the first end-to-end training framework to optimize LLM-generated user profiles. Our method significantly outperforms zero-shot approaches by explicitly training the LLM for the recommendation objective. Through extensive evaluations across diverse training configurations and benchmarks, we demonstrate that LangPTune not only surpasses zero-shot baselines but can also matches the performance of state-of-the-art embedding-based methods. Finally, we investigate whether the training procedure preserves the interpretability of these profiles compared to zero-shot inference through both GPT-4 simulations and crowdworker user studies. Implementation of LangPTune can be found at https://github.com/ZhaolinGao/LangPTune.
Related papers
- SciNUP: Natural Language User Interest Profiles for Scientific Literature Recommendation [15.029309551125962]
Natural language (NL) user profiles in recommender systems offer greater transparency and user control.<n>There is scarcity of large-scale, publicly available test collections for evaluating NL profile-based recommendation.<n>We introduce SciNUP, a novel synthetic dataset for scholarly recommendation.
arXiv Detail & Related papers (2025-10-24T11:28:08Z) - PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training [9.093854840532062]
PITA is a novel framework that integrates preference feedback directly into the LLM's token generation.<n> PITA learns a small preference-based guidance policy to modify token probabilities at inference time without fine-tuning.<n>We evaluate PITA across diverse tasks, including mathematical reasoning and sentiment classification.
arXiv Detail & Related papers (2025-07-26T21:46:32Z) - Debiasing Online Preference Learning via Preference Feature Preservation [64.55924745257951]
Recent preference learning frameworks simplify human preferences with binary pairwise comparisons and scalar rewards.<n>This could make large language models' responses biased to mostly preferred features, and would be exacerbated during the iterations of online preference learning steps.<n>We propose Preference Feature Preservation to maintain the distribution of human preference features and utilize such rich signals throughout the online preference learning process.
arXiv Detail & Related papers (2025-06-06T13:19:07Z) - Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF [67.48004037550064]
We propose an active learning approach to efficiently select prompt and preference pairs.
Our method evaluates the gradients of all potential preference annotations to assess their impact on model updates.
Experimental results demonstrate that our method outperforms the baseline by up to 5% in win rates against the chosen completion.
arXiv Detail & Related papers (2025-03-28T04:22:53Z) - Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning [59.11519451499754]
Direct Preference Optimization (DPO) has emerged as a de-facto approach for aligning language models with human preferences.
Recent work has shown DPO's effectiveness relies on training data quality.
We discover that reference model probability space naturally detects high-quality training samples.
arXiv Detail & Related papers (2025-01-25T07:21:50Z) - Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback [50.84142264245052]
This work introduces the Align-SLM framework to enhance the semantic understanding of textless Spoken Language Models (SLMs)
Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO)
We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation.
arXiv Detail & Related papers (2024-11-04T06:07:53Z) - GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems [8.789624590579903]
Traditional POI recommendation systems often lack transparency, interpretability, and scrutability.
We propose a method that generates natural language (NL) user profiles from large-scale, location-based social network (LBSN) check-ins.
These NL profiles capture user preferences, routines, and behaviors, improving POI prediction accuracy while offering enhanced transparency.
arXiv Detail & Related papers (2024-10-28T00:39:22Z) - STAR: A Simple Training-free Approach for Recommendations using Large Language Models [36.18841135511487]
Current state-of-the-art methods rely on fine-tuning large language models (LLMs) to achieve optimal results.
We propose a framework that utilizes LLMs and can be applied to various recommendation tasks without the need for fine-tuning.
Our method achieves Hits@10 performance of +23.8% on Beauty, +37.5% on Toys & Games, and -1.8% on Sports & Outdoors.
arXiv Detail & Related papers (2024-10-21T19:34:40Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - A Prompting-Based Representation Learning Method for Recommendation with Large Language Models [2.1161973970603998]
We introduce the Prompting-Based Representation Learning Method for Recommendation (P4R) to boost the linguistic abilities of Large Language Models (LLMs) in Recommender Systems.
In our P4R framework, we utilize the LLM prompting strategy to create personalized item profiles.
In our evaluation, we compare P4R with state-of-the-art Recommender models and assess the quality of prompt-based profile generation.
arXiv Detail & Related papers (2024-09-25T07:06:14Z) - Guided Profile Generation Improves Personalization with LLMs [3.2685922749445617]
In modern commercial systems, including Recommendation, Ranking, and E-Commerce platforms, there is a trend towards incorporating Personalization context as input into Large Language Models (LLMs)
We propose Guided Profile Generation (GPG), a general method designed to generate personal profiles in natural language.
Our experimental results show that GPG improves LLM's personalization ability across different tasks, for example, it increases 37% accuracy in predicting personal preference compared to directly feeding the LLMs with raw personal context.
arXiv Detail & Related papers (2024-09-19T21:29:56Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.<n>We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks.<n>Experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models demonstrate the promising performance of our method in efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - Show, Don't Tell: Aligning Language Models with Demonstrated Feedback [54.10302745921713]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.
We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Uncertainty-Aware Explainable Recommendation with Large Language Models [15.229417987212631]
We develop a model that utilizes the ID vectors of user and item inputs as prompts for GPT-2.
We employ a joint training mechanism within a multi-task learning framework to optimize both the recommendation task and explanation task.
Our method achieves 1.59 DIV, 0.57 USR and 0.41 FCR on the Yelp, TripAdvisor and Amazon dataset respectively.
arXiv Detail & Related papers (2024-01-31T14:06:26Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - LlamaRec: Two-Stage Recommendation using Large Language Models for
Ranking [10.671747198171136]
We propose a two-stage framework using large language models for ranking-based recommendation (LlamaRec)
In particular, we use small-scale sequential recommenders to retrieve candidates based on the user interaction history.
LlamaRec consistently achieves datasets superior performance in both recommendation performance and efficiency.
arXiv Detail & Related papers (2023-10-25T06:23:48Z) - Read-only Prompt Optimization for Vision-Language Few-shot Learning [20.66798356082751]
Learnable prompts can affect the internal representation within the self-attention module.
We propose a novel approach, Read-only Prompt Optimization (RPO)
Our experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain generalization.
arXiv Detail & Related papers (2023-08-29T01:22:30Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z) - ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation [43.270424225285105]
We focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks.
We propose Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-08-22T02:25:04Z) - LLM-Rec: Personalized Recommendation via Prompting Large Language Models [62.481065357472964]
Large language models (LLMs) have showcased their ability to harness commonsense knowledge and reasoning.
Recent advances in large language models (LLMs) have showcased their remarkable ability to harness commonsense knowledge and reasoning.
This study introduces a novel approach, coined LLM-Rec, which incorporates four distinct prompting strategies of text enrichment for improving personalized text-based recommendations.
arXiv Detail & Related papers (2023-07-24T18:47:38Z) - Recommendation as Instruction Following: A Large Language Model
Empowered Recommendation Approach [83.62750225073341]
We consider recommendation as instruction following by large language models (LLMs)
We first design a general instruction format for describing the preference, intention, task form and context of a user in natural language.
Then we manually design 39 instruction templates and automatically generate a large amount of user-personalized instruction data.
arXiv Detail & Related papers (2023-05-11T17:39:07Z) - Offline RL for Natural Language Generation with Implicit Language Q
Learning [87.76695816348027]
Large language models can be inconsistent when it comes to completing user specified tasks.
We propose a novel RL method, that combines both the flexible utility framework of RL with the ability of supervised learning.
In addition to empirically validating ILQL, we present a detailed empirical analysis situations where offline RL can be useful in natural language generation settings.
arXiv Detail & Related papers (2022-06-05T18:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.