Related papers: DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems

DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems

URL: http://arxiv.org/abs/2512.00596v1
Date: Sat, 29 Nov 2025 18:57:42 GMT
Title: DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems
Authors: Jiahao Tian, Zhenkai Wang,
Abstract summary: Large Language Models (LLMs) generate rich, yet high-dimensional and noisy, multi-modal features.<n>Treating these features as static inputs decouples them from the core recommendation task.<n>We introduce a novel framework built on a key insight: deeply fusing multi-modal and collaborative knowledge for representation denoising.
Score: 0.6875312133832079
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern recommender systems struggle to effectively utilize the rich, yet high-dimensional and noisy, multi-modal features generated by Large Language Models (LLMs). Treating these features as static inputs decouples them from the core recommendation task. We address this limitation with a novel framework built on a key insight: deeply fusing multi-modal and collaborative knowledge for representation denoising. Our unified architecture introduces two primary technical innovations. First, we integrate dimensionality reduction directly into the recommendation model, enabling end-to-end co-training that makes the reduction process aware of the final ranking objective. Second, we introduce a contrastive learning objective that explicitly incorporates the collaborative filtering signal into the latent space. This synergistic process refines raw LLM embeddings, filtering noise while amplifying task-relevant signals. Extensive experiments confirm our method's superior discriminative power, proving that this integrated fusion and denoising strategy is critical for achieving state-of-the-art performance. Our work provides a foundational paradigm for effectively harnessing LLMs in recommender systems.

Related papers

Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality [59.651410243721045]
CoCoA is a Content reconstruction pre-training paradigm based on Collaborative Attention for multimodal embedding optimization.<n>We introduce an EOS-based reconstruction task, encouraging the model to reconstruct input from the corresponding EOS> embeddings.<n>Experiments on MMEB-V1 demonstrate that CoCoA built upon Qwen2-VL and Qwen2.5-VL significantly improves embedding quality.
arXiv Detail & Related papers (2026-03-02T05:34:45Z)
Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion [12.729411315533786]
We propose a lightweight and plug-and-play approach that adaptively fuses multi-layer representations from frozen LVLMs with item ID embeddings.<n>DFF achieves state-of-the-art performance on two real-world micro-video recommendation benchmarks.
arXiv Detail & Related papers (2025-12-26T04:56:28Z)
Language Ranker: A Lightweight Ranking framework for LLM Decoding [70.01564145836129]
This paper conceptualizes the decoding process as analogous to the ranking stage in recommendation pipelines.<n>Motivated by this insight, we propose Language Ranker, a novel framework that introduces a lightweight module to rerank candidate responses.<n> Experiments show that Language Ranker achieves performance comparable to large-scale reward models, while requiring only 0.5M additional parameters.
arXiv Detail & Related papers (2025-10-23T17:56:46Z)
Empowering Denoising Sequential Recommendation with Large Language Model Embeddings [18.84444501128626]
Sequential recommendation aims to capture user preferences by modeling sequential patterns in user-item interactions.<n>To reduce the effect of noise, some works propose explicitly identifying and removing noisy items.<n>We propose a novel framework: Interest Alignment for Denoising Sequential Recommendation (IADSR) which integrates both collaborative and semantic information.
arXiv Detail & Related papers (2025-10-05T15:10:51Z)
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution [46.311223206965934]
We study key design principles for latter cascaded video super-resolution models, which are underexplored currently.<n>First, we propose two strategies to generate training pairs that better mimic the output characteristics of the base model, ensuring alignment between the VSR model and its upstream generator.<n>Second, we provide critical insights into VSR model behavior through systematic analysis of (1) timestep sampling strategies, (2) noise augmentation effects on low-resolution (LR) inputs.
arXiv Detail & Related papers (2025-06-24T17:57:26Z)
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration [60.47645731801866]
Large language models (LLMs) are increasingly leveraged as foundational backbones in advanced recommender systems.<n>LLMs are pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone.<n>We propose EAGER-LLM, a decoder-only generative recommendation framework that integrates endogenous and endogenous behavioral and semantic information in a non-intrusive manner.
arXiv Detail & Related papers (2025-02-20T17:01:57Z)
Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization [10.47505806629852]
Large language models (LLMs) are adept at discerning profound user interests from historical behaviors.<n>We propose a novel framework that harmoniously merges traditional recommendation models with the prowess of LLMs.<n>We design a series of specialized supervised learning tasks aimed at aligning collaborative signals with the subtleties of natural language semantics.
arXiv Detail & Related papers (2024-12-18T12:07:58Z)
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation [31.599902235859687]
We propose to leverage an aligned Gaussian noise as implicit guidance to complement explicit user-defined inputs, such as text prompts.<n>NoiseQuery enables fine-grained control and yields significant performance boosts over high-level semantics and over low-level visual attributes.
arXiv Detail & Related papers (2024-12-06T14:59:00Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.<n>Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.<n>We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z)
Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity [10.683635786183894]
CF-Diff is a new diffusion model-based collaborative filtering method. It is capable of making full use of collaborative signals along with multi-hop neighbors. It achieves remarkable gains up to 7.29% compared to the best competitor.
arXiv Detail & Related papers (2024-04-22T14:49:46Z)
T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning [31.276142111455847]
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning.<n>We design a novel framework, mixunderlinetextbfTureunderlinetextbf-of-underlinetextbfRank-onunderlinetextbfE-eunderlinetextbfXper ts (textttT-REX)<n>Rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.