What Matters in LLM-Based Feature Extractor for Recommender? A Systematic Analysis of Prompts, Models, and Adaptation
- URL: http://arxiv.org/abs/2509.14979v2
- Date: Fri, 19 Sep 2025 04:12:05 GMT
- Title: What Matters in LLM-Based Feature Extractor for Recommender? A Systematic Analysis of Prompts, Models, and Adaptation
- Authors: Kainan Shi, Peilin Zhou, Ge Wang, Han Ding, Fei Wang,
- Abstract summary: We propose RecXplore, a modular framework that decomposes the LLM-as-feature-extractor pipeline into four modules.<n>Instead of proposing new techniques, RecXplore revisits and organizes established methods, enabling systematic exploration of each module in isolation.<n>Experiments show that simply combining the best designs from existing techniques without exhaustive search yields up to 18.7% relative improvement in NDCG@5 and 12.7% in HR@5 over strong baselines.
- Score: 14.788780469735242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using Large Language Models (LLMs) to generate semantic features has been demonstrated as a powerful paradigm for enhancing Sequential Recommender Systems (SRS). This typically involves three stages: processing item text, extracting features with LLMs, and adapting them for downstream models. However, existing methods vary widely in prompting, architecture, and adaptation strategies, making it difficult to fairly compare design choices and identify what truly drives performance. In this work, we propose RecXplore, a modular analytical framework that decomposes the LLM-as-feature-extractor pipeline into four modules: data processing, semantic feature extraction, feature adaptation, and sequential modeling. Instead of proposing new techniques, RecXplore revisits and organizes established methods, enabling systematic exploration of each module in isolation. Experiments on four public datasets show that simply combining the best designs from existing techniques without exhaustive search yields up to 18.7% relative improvement in NDCG@5 and 12.7% in HR@5 over strong baselines. These results underscore the utility of modular benchmarking for identifying effective design patterns and promoting standardized research in LLM-enhanced recommendation.
Related papers
- Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models [48.83701310501069]
Large Language Models (LLMs) offer a transformative approach to Neural Architecture Search (NAS)<n>We formulate the search as a sequence of conditional code generation tasks, where an LLM refines architectural specifications based on performance telemetry.<n>We generate a vast corpus of valid, shape-consistent architectures via Abstract Syntax Tree (AST) mutations.<n> Experimental results on CIFAR-100 validate the efficacy of this approach, demonstrating that the model yields statistically significant improvements in accuracy.
arXiv Detail & Related papers (2026-01-13T13:00:30Z) - Improving LLM-based Ontology Matching with fine-tuning on synthetic data [0.0]
Large Language Models (LLMs) are increasingly being integrated into various components of Ontology Matching pipelines.<n>This paper investigates the capability of LLMs to perform ontology matching directly on ontology modules and generate the corresponding alignments.<n>A dedicated fine-tuning strategy can enhance the model's matching performance in a zero-shot setting.
arXiv Detail & Related papers (2025-11-27T16:46:45Z) - HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions [50.61510609116118]
HuggingR$4$ is a novel framework that combines Reasoning, Retrieval, Refinement, and Reflection to efficiently select models.<n>It attains a workability rate of 92.03% and a reasonability rate of 82.46%, surpassing existing method by 26.51% and 33.25% respectively.
arXiv Detail & Related papers (2025-11-24T03:13:45Z) - Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities [0.0]
Large Language Models (LLMs) are widely used to support various disciplines, yet their potential in choice modelling remains relatively unexplored.<n>This work examines the potential of LLMs as assistive agents in the specification and, where technically feasible, estimation of Multinomial Logit models.
arXiv Detail & Related papers (2025-07-29T13:24:44Z) - Large Language Model-Driven Surrogate-Assisted Evolutionary Algorithm for Expensive Optimization [22.024630467760264]
Surrogate-assisted evolutionary algorithms (SAEAs) are a key tool for addressing costly optimization tasks.<n>This paper proposes LLM-SAEA, a novel approach that integrates large language models (LLMs) to configure both surrogate models and infill sampling criteria online.
arXiv Detail & Related papers (2025-06-20T13:44:21Z) - Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models [50.19188692497892]
Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
arXiv Detail & Related papers (2025-05-26T08:53:02Z) - ExpertSteer: Intervening in LLMs through Expert Knowledge [71.12193680015622]
Activation steering offers a promising method to control the generation process of Large Language Models.<n>We propose ExpertSteer, a novel approach that leverages arbitrary specialized expert models to generate steering vectors.<n>We conduct comprehensive experiments using three LLMs on 15 popular benchmarks across four distinct domains.
arXiv Detail & Related papers (2025-05-18T08:55:46Z) - BLADE: Benchmark suite for LLM-driven Automated Design and Evolution of iterative optimisation heuristics [2.2485774453793037]
BLADE is a framework for benchmarking LLM-driven AAD methods in a continuous black-box optimisation context.<n>It integrates benchmark problems with instance generators and textual descriptions aimed at capability-focused testing, such as specialisation and information exploitation.<n> BLADE provides an out-of-the-box' solution to systematically evaluate LLM-driven AAD approaches.
arXiv Detail & Related papers (2025-04-28T18:34:09Z) - Efficient Model Selection for Time Series Forecasting via LLMs [52.31535714387368]
We propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection.<n>Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs.
arXiv Detail & Related papers (2025-04-02T20:33:27Z) - MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning [69.7347209018861]
We introduce MLLM-Selector, an automated approach that identifies valuable data for visual instruction tuning.<n>We calculate necessity scores for each sample in the VIT data pool to identify samples pivotal for enhancing model performance.<n>Our findings underscore the importance of mixing necessity and diversity in data choice, leading to the creation of MLLM-Selector.
arXiv Detail & Related papers (2025-03-26T12:42:37Z) - IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts [28.9807389592324]
Large language model (LLM) agents have emerged as a promising solution to automate the workflow of machine learning.<n>We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design inspired by how human ML experts iteratively refine models.<n>By systematically updating individual components based on real training feedback, Iterative Refinement improves overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z) - Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.