DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems
- URL: http://arxiv.org/abs/2512.00596v1
- Date: Sat, 29 Nov 2025 18:57:42 GMT
- Title: DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems
- Authors: Jiahao Tian, Zhenkai Wang,
- Abstract summary: Large Language Models (LLMs) generate rich, yet high-dimensional and noisy, multi-modal features.<n>Treating these features as static inputs decouples them from the core recommendation task.<n>We introduce a novel framework built on a key insight: deeply fusing multi-modal and collaborative knowledge for representation denoising.
- Score: 0.6875312133832079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern recommender systems struggle to effectively utilize the rich, yet high-dimensional and noisy, multi-modal features generated by Large Language Models (LLMs). Treating these features as static inputs decouples them from the core recommendation task. We address this limitation with a novel framework built on a key insight: deeply fusing multi-modal and collaborative knowledge for representation denoising. Our unified architecture introduces two primary technical innovations. First, we integrate dimensionality reduction directly into the recommendation model, enabling end-to-end co-training that makes the reduction process aware of the final ranking objective. Second, we introduce a contrastive learning objective that explicitly incorporates the collaborative filtering signal into the latent space. This synergistic process refines raw LLM embeddings, filtering noise while amplifying task-relevant signals. Extensive experiments confirm our method's superior discriminative power, proving that this integrated fusion and denoising strategy is critical for achieving state-of-the-art performance. Our work provides a foundational paradigm for effectively harnessing LLMs in recommender systems.
Related papers
- Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality [59.651410243721045]
CoCoA is a Content reconstruction pre-training paradigm based on Collaborative Attention for multimodal embedding optimization.<n>We introduce an EOS-based reconstruction task, encouraging the model to reconstruct input from the corresponding EOS> embeddings.<n>Experiments on MMEB-V1 demonstrate that CoCoA built upon Qwen2-VL and Qwen2.5-VL significantly improves embedding quality.
arXiv Detail & Related papers (2026-03-02T05:34:45Z) - Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion [12.729411315533786]
We propose a lightweight and plug-and-play approach that adaptively fuses multi-layer representations from frozen LVLMs with item ID embeddings.<n>DFF achieves state-of-the-art performance on two real-world micro-video recommendation benchmarks.
arXiv Detail & Related papers (2025-12-26T04:56:28Z) - Language Ranker: A Lightweight Ranking framework for LLM Decoding [70.01564145836129]
This paper conceptualizes the decoding process as analogous to the ranking stage in recommendation pipelines.<n>Motivated by this insight, we propose Language Ranker, a novel framework that introduces a lightweight module to rerank candidate responses.<n> Experiments show that Language Ranker achieves performance comparable to large-scale reward models, while requiring only 0.5M additional parameters.
arXiv Detail & Related papers (2025-10-23T17:56:46Z) - Empowering Denoising Sequential Recommendation with Large Language Model Embeddings [18.84444501128626]
Sequential recommendation aims to capture user preferences by modeling sequential patterns in user-item interactions.<n>To reduce the effect of noise, some works propose explicitly identifying and removing noisy items.<n>We propose a novel framework: Interest Alignment for Denoising Sequential Recommendation (IADSR) which integrates both collaborative and semantic information.
arXiv Detail & Related papers (2025-10-05T15:10:51Z) - SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution [46.311223206965934]
We study key design principles for latter cascaded video super-resolution models, which are underexplored currently.<n>First, we propose two strategies to generate training pairs that better mimic the output characteristics of the base model, ensuring alignment between the VSR model and its upstream generator.<n>Second, we provide critical insights into VSR model behavior through systematic analysis of (1) timestep sampling strategies, (2) noise augmentation effects on low-resolution (LR) inputs.
arXiv Detail & Related papers (2025-06-24T17:57:26Z) - EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration [60.47645731801866]
Large language models (LLMs) are increasingly leveraged as foundational backbones in advanced recommender systems.<n>LLMs are pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone.<n>We propose EAGER-LLM, a decoder-only generative recommendation framework that integrates endogenous and endogenous behavioral and semantic information in a non-intrusive manner.
arXiv Detail & Related papers (2025-02-20T17:01:57Z) - Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization [10.47505806629852]
Large language models (LLMs) are adept at discerning profound user interests from historical behaviors.<n>We propose a novel framework that harmoniously merges traditional recommendation models with the prowess of LLMs.<n>We design a series of specialized supervised learning tasks aimed at aligning collaborative signals with the subtleties of natural language semantics.
arXiv Detail & Related papers (2024-12-18T12:07:58Z) - The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation [31.599902235859687]
We propose to leverage an aligned Gaussian noise as implicit guidance to complement explicit user-defined inputs, such as text prompts.<n>NoiseQuery enables fine-grained control and yields significant performance boosts over high-level semantics and over low-level visual attributes.
arXiv Detail & Related papers (2024-12-06T14:59:00Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.<n>Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.<n>We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z) - Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity [10.683635786183894]
CF-Diff is a new diffusion model-based collaborative filtering method.
It is capable of making full use of collaborative signals along with multi-hop neighbors.
It achieves remarkable gains up to 7.29% compared to the best competitor.
arXiv Detail & Related papers (2024-04-22T14:49:46Z) - T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning [31.276142111455847]
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning.<n>We design a novel framework, mixunderlinetextbfTureunderlinetextbf-of-underlinetextbfRank-onunderlinetextbfE-eunderlinetextbfXper ts (textttT-REX)<n>Rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.