Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation
- URL: http://arxiv.org/abs/2506.21599v2
- Date: Mon, 30 Jun 2025 11:36:12 GMT
- Title: Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation
- Authors: Peibo Li, Shuang Ao, Hao Xue, Yang Song, Maarten de Rijke, Johan Barthélemy, Tomasz Bednarz, Flora D. Salim,
- Abstract summary: Large language models (LLMs) have been adopted for next point-of-interest (POI) recommendation tasks.<n>We propose Refine-POI, a reinforcement fine-tuning framework for next POI recommendation.
- Score: 51.08869388483333
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) have been adopted for next point-of-interest (POI) recommendation tasks. Typical LLM-based recommenders fall into two categories: prompt-based and supervised fine-tuning (SFT)-based models. Prompt-based models generally offer greater output flexibility but deliver lower accuracy, whereas SFT-based models achieve higher performance yet face a fundamental mismatch: next POI recommendation data does not naturally suit supervised fine-tuning. In SFT, the model is trained to reproduce the exact ground truth, but each training example provides only a single target POI, so there is no ground truth for producing a top-k list. To address this, we propose Refine-POI, a reinforcement fine-tuning framework for next POI recommendation. We introduce recommendation-driven rewards that enable LLMs to learn to generate top-k recommendation lists using only one ground-truth POI per example. Experiments on real-world datasets demonstrate that Refine-POI achieves state-of-the-art top-k recommendation performance.
Related papers
- Evaluating Position Bias in Large Language Model Recommendations [3.430780143519032]
Large Language Models (LLMs) are being increasingly explored as general-purpose tools for recommendation tasks.<n>We show that LLM-based recommendation models suffer from position bias, where the order of candidate items in a prompt can disproportionately influence the recommendations produced by LLMs.<n>We introduce a new prompting strategy to mitigate the position bias of LLM recommendation models called Ranking via Iterative SElection.
arXiv Detail & Related papers (2025-08-04T03:30:26Z) - $\ ext{R}^2\ ext{ec}$: Towards Large Recommender Models with Reasoning [50.291998724376654]
We propose name, a unified large recommender model with intrinsic reasoning capabilities.<n> RecPO is a corresponding reinforcement learning framework that optimize name both the reasoning and recommendation capabilities simultaneously in a single policy update.<n> Experiments on three datasets with various baselines verify the effectiveness of name, showing relative improvements of 68.67% in Hit@5 and 45.21% in NDCG@20.
arXiv Detail & Related papers (2025-05-22T17:55:43Z) - A Survey of Direct Preference Optimization [103.59317151002693]
Large Language Models (LLMs) have demonstrated unprecedented generative capabilities.<n>Their alignment with human values remains critical for ensuring helpful and harmless deployments.<n>Direct Preference Optimization (DPO) has recently gained prominence as a streamlined alternative.
arXiv Detail & Related papers (2025-03-12T08:45:15Z) - MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples [22.521746860874305]
This study introduces the MPPO algorithm, which leverages the average likelihood of model responses to fit the reward function.<n>Through a comparison of Point-wise, Pair-wise, and List-wise implementations, we found that the Pair-wise approach achieves the best performance.<n> Experimental results demonstrate MPPO's outstanding performance across various benchmarks.
arXiv Detail & Related papers (2024-12-13T14:18:58Z) - SPRec: Self-Play to Debias LLM-based Recommendation [23.875509546540904]
Large language models (LLMs) have attracted significant attention in recommendation systems.<n>We propose SPRec, a novel self-play framework designed to mitigate over-recommendation and improve fairness without requiring additional data or manual intervention.
arXiv Detail & Related papers (2024-12-12T12:53:30Z) - Direct Preference Optimization for LLM-Enhanced Recommendation Systems [33.54698201942643]
Large Language Models (LLMs) have exhibited remarkable performance across a wide range of domains.<n>We propose DPO4Rec, a framework that integrates DPO into LLM-enhanced recommendation systems.<n>Extensive experiments show that DPO4Rec significantly improves re-ranking performance over strong baselines.
arXiv Detail & Related papers (2024-10-08T11:42:37Z) - On Softmax Direct Preference Optimization for Recommendation [50.896117978746]
We propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives.
Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders.
arXiv Detail & Related papers (2024-06-13T15:16:11Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z) - Model Extrapolation Expedites Alignment [135.12769233630362]
We propose a method called ExPO to expedite alignment training with human preferences.<n>We demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one.<n>We show that ExPO notably improves existing open-source LLMs on the leading AlpacaEval 2.0 and MT-Bench benchmarks.
arXiv Detail & Related papers (2024-04-25T17:39:50Z) - Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting [65.00288634420812]
Pairwise Ranking Prompting (PRP) is a technique to significantly reduce the burden on Large Language Models (LLMs)
Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs.
arXiv Detail & Related papers (2023-06-30T11:32:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.