SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support
- URL: http://arxiv.org/abs/2512.11755v1
- Date: Fri, 12 Dec 2025 18:05:52 GMT
- Title: SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support
- Authors: Yuming Feng, Xinrui Jiang,
- Abstract summary: Online product reviews contain rich but noisy signals that overwhelm users and hinder effective decision-making.<n>We propose a steerable review summarization framework that aligns outputs with explicit user personas to support personalized purchase decisions.<n>Our results highlight the promise of steerable pluralistic alignment for building next-generation personalized decision-support systems.
- Score: 3.755588097509539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online product reviews contain rich but noisy signals that overwhelm users and hinder effective decision-making. Existing LLM-based summarizers remain generic and fail to account for individual preferences, limiting their practical utility. We propose SUMFORU, a steerable review summarization framework that aligns outputs with explicit user personas to support personalized purchase decisions. Our approach integrates a high-quality data pipeline built from the Amazon 2023 Review Dataset with a two-stage alignment procedure: (1) persona-aware Supervised Fine-Tuning (SFT) via asymmetric knowledge distillation, and (2) Reinforcement Learning with AI Feedback (RLAIF) using a preference estimator to capture fine-grained, persona-relevant signals. We evaluate the model across rule-based, LLM-based, and human-centered metrics, demonstrating consistent improvements in consistency, grounding, and preference alignment. Our framework achieves the highest performance across all evaluation settings and generalizes effectively to unseen product categories. Our results highlight the promise of steerable pluralistic alignment for building next-generation personalized decision-support systems.
Related papers
- PersoDPO: Scalable Preference Optimization for Instruction-Adherent, Persona-Grounded Dialogue via Multi-LLM Evaluation [20.228114552545772]
PersoDPO is a scalable preference optimisation framework.<n>It integrates evaluation metrics targeting coherence and personalization, along with a length-format compliance feature.<n>Experiments on the FoCus dataset show that an open-source language model fine-tuned with the PersoDPO framework consistently outperforms strong open-source baselines.
arXiv Detail & Related papers (2026-02-04T12:34:55Z) - SRLF: An Agent-Driven Set-Wise Reflective Learning Framework for Sequential Recommendation [16.741106736240603]
Our framework operationalizes a closed-loop "assess-validate-reflect" cycle that harnesses the powerful in-context learning capabilities of LLMs.<n>Our method allows our model to capture complex patterns essential to user behavior, making it significantly more adept for sequential recommendation tasks.
arXiv Detail & Related papers (2025-11-14T14:50:33Z) - Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information [57.397381631496906]
We develop two new aggregation algorithms called Optimal Weight (OW) and Inverse Surprising Popularity (ISP)<n>Our theoretical analysis shows these methods provably mitigate inherent limitations of majority voting under mild assumptions.<n>We empirically validate our algorithms on synthetic datasets, popular LLM fine-tuning benchmarks such as UltraFeedback and MMLU, and a real-world healthcare setting ARMMAN.
arXiv Detail & Related papers (2025-10-01T22:21:50Z) - Learning to Shop Like Humans: A Review-driven Retrieval-Augmented Recommendation Framework with LLMs [30.748667156183004]
RevBrowse is a review-driven recommendation framework inspired by the "browse-then-decide" decision process.<n>RevBrowse integrates user reviews into the LLM-based reranking process to enhance its ability to distinguish between candidate items.<n>PrefRAG is a retrieval-augmented module that disentangles user and item representations into structured forms.
arXiv Detail & Related papers (2025-08-31T04:37:43Z) - End-to-End Personalization: Unifying Recommender Systems with Large Language Models [0.0]
We propose a novel hybrid recommendation framework that combines Graph Attention Networks (GATs) with Large Language Models (LLMs)<n>LLMs are first used to enrich user and item representations by generating semantically meaningful profiles based on metadata such as titles, genres, and overviews.<n>We evaluate our model on benchmark datasets, including MovieLens 100k and 1M, where it consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-08-02T22:46:50Z) - ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making [10.558361310945164]
We develop ALIGN, a system for dynamic personalization of large language models (LLMs)<n>Key features of our system include robust configuration management, structured output generation with reasoning, and several algorithm implementations with swappable LLM backbones.<n>The entire ALIGN framework is open source and will enable new research on reliable, responsible, and personalized LLM-based decision-makers.
arXiv Detail & Related papers (2025-07-11T21:33:38Z) - AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset [89.37514696019484]
Preference learning is critical for aligning large language models with human values.<n>Our work shifts preference dataset design from ad hoc scaling to component-aware optimization.
arXiv Detail & Related papers (2025-04-04T17:33:07Z) - FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users [111.56469697145519]
We propose Few-Shot Preference Optimization, which reframes reward modeling as a meta-learning problem.<n>Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them.<n>We generate over 1M synthetic personalized preferences using publicly available LLMs.<n>We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study.
arXiv Detail & Related papers (2025-02-26T17:08:46Z) - Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment [69.11529841118671]
We propose a new Deliberative Recommendation task, which incorporates explicit reasoning about user preferences as an additional alignment goal.<n>We then introduce the Reasoning-powered Recommender framework for deliberative user preference alignment.
arXiv Detail & Related papers (2025-02-04T07:17:54Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - RosePO: Aligning LLM-based Recommenders with Human Values [38.029251417802044]
We propose a general framework -- Recommendation with smoothing personalized Preference Optimization (RosePO)
RosePO better aligns with customized human values during the post-training stage.
Evaluation on three real-world datasets demonstrates the effectiveness of our method.
arXiv Detail & Related papers (2024-10-16T12:54:34Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.