Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommendation
- URL: http://arxiv.org/abs/2503.12183v1
- Date: Sat, 15 Mar 2025 15:54:44 GMT
- Title: Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommendation
- Authors: Enze Liu, Bowen Zheng, Wayne Xin Zhao, Ji-Rong Wen,
- Abstract summary: CoCoRec is a novel Code-based textual and Collaborative semantic fusion method for sequential Recommendation.<n>We generate fine-grained semantic codes from multi-view text embeddings through vector quantization techniques.<n>In order to further enhance the fusion of textual and collaborative semantics, we introduce an optimization strategy.
- Score: 91.13055384151897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, substantial research efforts have been devoted to enhancing sequential recommender systems by integrating abundant side information with ID-based collaborative information. This study specifically focuses on leveraging the textual metadata (e.g., titles and brands) associated with items. While existing methods have achieved notable success by combining text and ID representations, they often struggle to strike a balance between textual information embedded in text representations and collaborative information from sequential patterns of user behavior. In light of this, we propose CoCoRec, a novel Code-based textual and Collaborative semantic fusion method for sequential Recommendation. The key idea behind our approach is to bridge the gap between textual and collaborative information using semantic codes. Specifically, we generate fine-grained semantic codes from multi-view text embeddings through vector quantization techniques. Subsequently, we develop a code-guided semantic-fusion module based on the cross-attention mechanism to flexibly extract and integrate relevant information from text representations. In order to further enhance the fusion of textual and collaborative semantics, we introduce an optimization strategy that employs code masking with two specific objectives: masked code modeling and masked sequence alignment. The merit of these objectives lies in leveraging mask prediction tasks and augmented item representations to capture code correlations within individual items and enhance the sequence modeling of the recommendation backbone. Extensive experiments conducted on four public datasets demonstrate the superiority of CoCoRec, showing significant improvements over various sequential recommendation models. Our code is available at https://anonymous.4open.science/r/CoCoRec-6E41.
Related papers
- TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting [46.753153357441505]
Generalizable Gaussian Splatting has enabled robust 3D reconstruction from sparse input views.
We propose TextSplat--the first text-driven Generalizable Gaussian Splatting framework.
arXiv Detail & Related papers (2025-04-13T14:14:10Z) - Universal Item Tokenization for Transferable Generative Recommendation [89.42584009980676]
We propose UTGRec, a universal item tokenization approach for transferable Generative Recommendation.
By devising tree-structured codebooks, we discretize content representations into corresponding codes for item tokenization.
For raw content reconstruction, we employ dual lightweight decoders to reconstruct item text and images from discrete representations.
For collaborative knowledge integration, we assume that co-occurring items are similar and integrate collaborative signals through co-occurrence alignment and reconstruction.
arXiv Detail & Related papers (2025-04-06T08:07:49Z) - Contextualized Data-Wrangling Code Generation in Computational Notebooks [131.26365849822932]
We propose an automated approach, CoCoMine, to mine data-wrangling code generation examples with clear multi-modal contextual dependency.
We construct CoCoNote, a dataset containing 58,221 examples for Contextualized Data-wrangling Code generation in Notebooks.
Experiment results demonstrate the significance of incorporating data context in data-wrangling code generation.
arXiv Detail & Related papers (2024-09-20T14:49:51Z) - ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization [21.886950861445122]
Code summarization aims to automatically generate succinct natural language summaries for given code snippets.
This paper proposes a novel approach to improve code summarization based on summary-focused tasks.
arXiv Detail & Related papers (2024-07-01T03:06:51Z) - Encoding Version History Context for Better Code Representation [13.045078976464307]
This paper presents preliminary evidence of the potential benefit of encoding contextual information from the version history to predict code clones and perform code classification.
To ensure the technique performs consistently, we need to conduct a holistic investigation on a larger code base using different combinations of contexts, aggregation, and models.
arXiv Detail & Related papers (2024-02-06T07:35:36Z) - Soft-Labeled Contrastive Pre-training for Function-level Code
Representation [127.71430696347174]
We present textbfSCodeR, a textbfSoft-labeled contrastive pre-training framework with two positive sample construction methods.
Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels.
SCodeR achieves new state-of-the-art performance on four code-related tasks over seven datasets.
arXiv Detail & Related papers (2022-10-18T05:17:37Z) - CSSAM:Code Search via Attention Matching of Code Semantics and
Structures [8.547332796736107]
This paper introduces a code search model named CSSAM (Code Semantics and Structures Attention Matching)
By introducing semantic and structural matching mechanisms, CSSAM effectively extracts and fuses multidimensional code features.
By leveraging the residual interaction, a matching module is designed to preserve more code semantics and descriptive features.
arXiv Detail & Related papers (2022-08-08T05:45:40Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.