BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models
- URL: http://arxiv.org/abs/2601.22925v1
- Date: Fri, 30 Jan 2026 12:45:02 GMT
- Title: BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models
- Authors: Weiqin Yang, Bohao Wang, Zhenxiang Xu, Jiawei Chen, Shengjia Zhang, Jingbang Chen, Canghong Jin, Can Wang,
- Abstract summary: BEAR (Beam-SEarch-Aware Regularization) is a novel fine-tuning objective that explicitly accounts for beam search behavior during training.<n>It significantly outperforms strong baselines in experiments across four real-world datasets.
- Score: 8.738350030995894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed a rapid surge in research leveraging Large Language Models (LLMs) for recommendation. These methods typically employ supervised fine-tuning (SFT) to adapt LLMs to recommendation scenarios, and utilize beam search during inference to efficiently retrieve $B$ top-ranked recommended items. However, we identify a critical training-inference inconsistency: while SFT optimizes the overall probability of positive items, it does not guarantee that such items will be retrieved by beam search even if they possess high overall probabilities. Due to the greedy pruning mechanism, beam search can prematurely discard a positive item once its prefix probability is insufficient. To address this inconsistency, we propose BEAR (Beam-SEarch-Aware Regularization), a novel fine-tuning objective that explicitly accounts for beam search behavior during training. Rather than directly simulating beam search for each instance during training, which is computationally prohibitive, BEAR enforces a relaxed necessary condition: each token in a positive item must rank within the top-$B$ candidate tokens at each decoding step. This objective effectively mitigates the risk of incorrect pruning while incurring negligible computational overhead compared to standard SFT. Extensive experiments across four real-world datasets demonstrate that BEAR significantly outperforms strong baselines. Code will be released upon acceptance.
Related papers
- APAO: Adaptive Prefix-Aware Optimization for Generative Recommendation [26.371939617653084]
Generative recommendation is an autoregressive generation process, predicting discrete tokens of the next item conditioned on user interaction histories.<n>Existing generative recommendation models are typically trained with token-level likelihood objectives, such as cross-entropy loss.<n>This leads to a training-inference inconsistency: standard training assumes ground-truth history is always available, ignoring the fact that beam search prunes low-probability branches during inference.
arXiv Detail & Related papers (2026-03-03T08:29:15Z) - BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation [6.272555849379284]
BiXSE is a pointwise training method that optimize binary cross-entropy over graded relevance scores.<n>It achieves strong performance with reduced annotation and compute costs.<n>BiXSE offers a robust, scalable alternative for training dense retrieval models.
arXiv Detail & Related papers (2025-08-09T02:15:17Z) - Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z) - Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM)
We propose a decoding algorithm integrating the self-evaluation guidance via beam search.
Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z) - Conditional Poisson Stochastic Beam Search [35.60062127942947]
Conditional Poisson beam search (CPSBS) is a more natural alternative to Kool et. al. 2019's beam search (SBS)
CPSBS produces lower variance and more efficient estimators than SBS, even showing improvements in high entropy settings.
arXiv Detail & Related papers (2021-09-22T20:49:16Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Best-First Beam Search [78.71330480725668]
We show that the standard implementation of beam search can be made up to 10x faster in practice.
We propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance.
arXiv Detail & Related papers (2020-07-08T05:56:01Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.