Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering
- URL: http://arxiv.org/abs/2602.18249v1
- Date: Fri, 20 Feb 2026 14:32:41 GMT
- Title: Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering
- Authors: Jiayi Wu, Zhengyu Wu, Xunkai Li, Rong-Hua Li, Guoren Wang,
- Abstract summary: Large language models (LLMs) have shown promise in recommender systems.<n>Existing methods rely on textual information and task-specific fine-tuning, limiting practical applicability.<n>We propose a text-free and fine-tuning-free Dual-Tree LLM-enhanced Negative Sampling method (DTL-NS)
- Score: 40.89512526196666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Negative sampling is a pivotal technique in implicit collaborative filtering (CF) recommendation, enabling efficient and effective training by contrasting observed interactions with sampled unobserved ones. Recently, large language models (LLMs) have shown promise in recommender systems; however, research on LLM-empowered negative sampling remains underexplored. Existing methods heavily rely on textual information and task-specific fine-tuning, limiting practical applicability. To address this limitation, we propose a text-free and fine-tuning-free Dual-Tree LLM-enhanced Negative Sampling method (DTL-NS). It consists of two modules: (i) an offline false negative identification module that leverages hierarchical index trees to transform collaborative structural and latent semantic information into structured item-ID encodings for LLM inference, enabling accurate identification of false negatives; and (ii) a multi-view hard negative sampling module that combines user-item preference scores with item-item hierarchical similarities from these encodings to mine high-quality hard negatives, thus improving models' discriminative ability. Extensive experiments demonstrate the effectiveness of DTL-NS. For example, on the Amazon-sports dataset, DTL-NS outperforms the strongest baseline by 10.64% and 19.12% in Recall@20 and NDCG@20, respectively. Moreover, DTL-NS can be integrated into various implicit CF models and negative sampling methods, consistently enhancing their performance.
Related papers
- Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers [80.55429742713623]
ILRec is a novel preference fine-tuning framework for LLM-based recommender systems.<n>We introduce a lightweight collaborative filtering model to assign token-level rewards for negative signals.<n>Experiments on three datasets demonstrate ILRec's effectiveness in enhancing the performance of LLM-based recommender systems.
arXiv Detail & Related papers (2026-02-19T14:37:43Z) - Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models [4.7341002297388295]
Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias.<n>Previous studies have attempted to identify noisy samples through their diverged data patterns, such as higher loss values.<n>We observed that noisy samples and hard samples display similar patterns, leading to hard-noisy confusion issue.
arXiv Detail & Related papers (2025-11-10T16:51:03Z) - UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning [101.62386137855704]
We present a novel Universal Multimodal Embedding (UniME-V2) model.<n>Our approach first constructs a potential hard negative set through global retrieval.<n>We then introduce the MLLM-as-a-Judge mechanism, which utilizes MLLMs to assess the semantic alignment of query-candidate pairs.<n>These scores serve as a foundation for hard negative mining, mitigating the impact of false negatives and enabling the identification of diverse, high-quality hard negatives.
arXiv Detail & Related papers (2025-10-15T13:07:00Z) - Can LLM-Driven Hard Negative Sampling Empower Collaborative Filtering? Findings and Potentials [9.668242919588199]
Hard negative samples can accelerate model convergence and optimize decision boundaries.<n>This paper introduces the concept of Semantic Negative Sampling.<n>We propose a framework called HNLMRec, based on fine-tuning LLMs supervised by collaborative signals.
arXiv Detail & Related papers (2025-04-07T04:39:45Z) - ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval Systems [7.897183317096681]
In the retrieval stage, classic embedding-based retrieval methods depend on effective negative sampling techniques to enhance both performance and efficiency.<n>We propose Effective and Semantic-Aware Negative Sampling (ESANS), which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC)
arXiv Detail & Related papers (2025-02-22T04:43:20Z) - Generating Negative Samples for Multi-Modal Recommendation [17.469360152396337]
Multi-modal recommender systems (MMRS) have gained significant attention due to their ability to leverage information from various modalities to enhance recommendation quality.<n>Existing negative sampling techniques often struggle to effectively utilize the multi-modal data, leading to suboptimal performance.<n>We propose NegGen, a novel framework that utilizes multi-modal large language models (MLLMs) to generate balanced and contrastive negative samples.
arXiv Detail & Related papers (2025-01-25T11:45:49Z) - LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.<n>We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.