Integrating Vision-Centric Text Understanding for Conversational Recommender Systems
- URL: http://arxiv.org/abs/2601.13505v1
- Date: Tue, 20 Jan 2026 01:41:54 GMT
- Title: Integrating Vision-Centric Text Understanding for Conversational Recommender Systems
- Authors: Wei Yuan, Shutong Qiao, Tong Chen, Quoc Viet Hung Nguyen, Zi Huang, Hongzhi Yin,
- Abstract summary: STARCRS is a Screen-Text-AwaRe Conversational Recommender System.<n>We propose a knowledge-anchored fusion framework that combines contrastive alignment, cross-attention interaction, and adaptive gating.<n>Experiments on two widely used benchmarks demonstrate that STARCRS consistently improves both recommendation accuracy and generated response quality.
- Score: 61.731947296510164
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Conversational Recommender Systems (CRSs) have attracted growing attention for their ability to deliver personalized recommendations through natural language interactions. To more accurately infer user preferences from multi-turn conversations, recent works increasingly expand conversational context (e.g., by incorporating diverse entity information or retrieving related dialogues). While such context enrichment can assist preference modeling, it also introduces longer and more heterogeneous inputs, leading to practical issues such as input length constraints, text style inconsistency, and irrelevant textual noise, thereby raising the demand for stronger language understanding ability. In this paper, we propose STARCRS, a Screen-Text-AwaRe Conversational Recommender System that integrates two complementary text understanding modes: (1) a screen-reading pathway that encodes auxiliary textual information as visual tokens, mimicking skim reading on a screen, and (2) an LLM-based textual pathway that focuses on a limited set of critical content for fine-grained reasoning. We design a knowledge-anchored fusion framework that combines contrastive alignment, cross-attention interaction, and adaptive gating to integrate the two modes for improved preference modeling and response generation. Extensive experiments on two widely used benchmarks demonstrate that STARCRS consistently improves both recommendation accuracy and generated response quality.
Related papers
- STEP: Stepwise Curriculum Learning for Context-Knowledge Fusion in Conversational Recommendation [18.833994388759326]
We introduce STEP, a conversational recommender centered on pre-trained language models.<n> STEP combines curriculum-guided context-knowledge fusion with lightweight task-specific prompt tuning.<n> Experimental results show that STEP outperforms mainstream methods in the precision of recommendation and dialogue quality in two public datasets.
arXiv Detail & Related papers (2025-08-14T14:08:21Z) - On Mitigating Data Sparsity in Conversational Recommender Systems [69.70761335240738]
Conversational recommender systems (CRSs) capture user preference through textual information in dialogues.<n>They suffer from data sparsity on two fronts: the dialogue space is vast and linguistically diverse, while the item space exhibits long-tail and sparse distributions.<n>Existing methods struggle with (1) generalizing to varied dialogue expressions due to underutilization of rich textual cues, and (2) learning informative item representations under severe sparsity.
arXiv Detail & Related papers (2025-07-01T06:54:51Z) - Beyond Whole Dialogue Modeling: Contextual Disentanglement for Conversational Recommendation [22.213312621287482]
This paper proposes a novel model to introduce contextual disentanglement for improving conversational recommender systems.<n>DisenCRS employs a dual disentanglement framework, including self-supervised contrastive disentanglement and counterfactual inference disentanglement.<n> Experimental results on two widely used public datasets demonstrate that DisenCRS significantly outperforms existing conversational recommendation models.
arXiv Detail & Related papers (2025-04-24T10:33:26Z) - MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems [15.792566559456422]
Conversational Recommender Systems (CRS) aim to provide personalized recommendations by interacting with users through conversations.<n>We propose a multi-modal semantic graph prompt learning framework for CRS, named MSCRS.<n>Our proposed method significantly improves accuracy in item recommendation, as well as generates more natural and contextually relevant content in response generation.
arXiv Detail & Related papers (2025-04-15T07:05:22Z) - Parameter-Efficient Conversational Recommender System as a Language
Processing Task [52.47087212618396]
Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation.
Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items.
In this paper, we represent items in natural language and formulate CRS as a natural language processing task.
arXiv Detail & Related papers (2024-01-25T14:07:34Z) - Towards Unified Conversational Recommender Systems via
Knowledge-Enhanced Prompt Learning [89.64215566478931]
Conversational recommender systems (CRS) aim to proactively elicit user preference and recommend high-quality items through natural language conversations.
To develop an effective CRS, it is essential to seamlessly integrate the two modules.
We propose a unified CRS model named UniCRS based on knowledge-enhanced prompt learning.
arXiv Detail & Related papers (2022-06-19T09:21:27Z) - CR-Walker: Tree-Structured Graph Reasoning and Dialog Acts for
Conversational Recommendation [62.13413129518165]
CR-Walker is a model that performs tree-structured reasoning on a knowledge graph.
It generates informative dialog acts to guide language generation.
Automatic and human evaluations show that CR-Walker can arrive at more accurate recommendation.
arXiv Detail & Related papers (2020-10-20T14:53:22Z) - Improving Conversational Recommender Systems via Knowledge Graph based
Semantic Fusion [77.21442487537139]
Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations.
First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference.
Second, there is a semantic gap between natural language expression and item-level user preference.
arXiv Detail & Related papers (2020-07-08T11:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.