Related papers: Intent Representation Learning with Large Language Model for Recommendation

Intent Representation Learning with Large Language Model for Recommendation

URL: http://arxiv.org/abs/2502.03307v4
Date: Wed, 09 Apr 2025 07:21:18 GMT
Title: Intent Representation Learning with Large Language Model for Recommendation
Authors: Yu Wang, Lei Sang, Yi Zhang, Yiwen Zhang,
Abstract summary: We propose a model-agnostic framework, Intent Representation Learning with Large Language Model (IRLLRec), to construct multimodal intents and enhance recommendations.<n>Specifically, IRLLRec employs a dual-tower architecture to learn multimodal intent representations.<n>To better match textual and interaction-based intents, we employ momentum distillation to perform teacher-student learning on fused intent representations.
Score: 11.118517297006894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Intent-based recommender systems have garnered significant attention for uncovering latent fine-grained preferences. Intents, as underlying factors of interactions, are crucial for improving recommendation interpretability. Most methods define intents as learnable parameters updated alongside interactions. However, existing frameworks often overlook textual information (e.g., user reviews, item descriptions), which is crucial for alleviating the sparsity of interaction intents. Exploring these multimodal intents, especially the inherent differences in representation spaces, poses two key challenges: i) How to align multimodal intents and effectively mitigate noise issues; ii) How to extract and match latent key intents across modalities. To tackle these challenges, we propose a model-agnostic framework, Intent Representation Learning with Large Language Model (IRLLRec), which leverages large language models (LLMs) to construct multimodal intents and enhance recommendations. Specifically, IRLLRec employs a dual-tower architecture to learn multimodal intent representations. Next, we propose pairwise and translation alignment to eliminate inter-modal differences and enhance robustness against noisy input features. Finally, to better match textual and interaction-based intents, we employ momentum distillation to perform teacher-student learning on fused intent representations. Empirical evaluations on three datasets show that our IRLLRec framework outperforms baselines.Code available at https://github.com/wangyu0627/IRLLRec.

Related papers

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z)
Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models [48.2311603411121]
We introduce an automated framework that simulates real-world multimodal news creation by explicitly modeling creator intent.<n>DeceptionDecoded is a benchmark comprising 12,000 image-caption pairs aligned with trustworthy reference articles.<n>We conduct a comprehensive evaluation of 14 state-of-the-art vision-language models (VLMs) on three intent-centric tasks.
arXiv Detail & Related papers (2025-05-21T13:14:32Z)
Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification [6.459396785817196]
Chain-of-Intent generates intent-driven conversations through self-play. MINT-CL is a framework for multi-turn intent classification using multi-task contrastive learning. We release MINT-E, a multilingual, intent-aware multi-turn e-commerce dialogue corpus.
arXiv Detail & Related papers (2024-11-21T15:59:29Z)
IntentGPT: Few-shot Intent Discovery with Large Language Models [9.245106106117317]
We develop a model capable of identifying new intents as they emerge. IntentGPT is a training-free method that effectively prompts Large Language Models (LLMs) to discover new intents with minimal labeled data. Our experiments show that IntentGPT outperforms previous methods that require extensive domain-specific data and fine-tuning.
arXiv Detail & Related papers (2024-11-16T02:16:59Z)
Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation [44.59113848489519]
We propose a novel model named Unified Dual-Intents Translation for joint modeling of Search and Recommendation (UDITSR) To accurately simulate users' demand intents in recommendation, we utilize real queries from search data as supervision information to guide its generation. Extensive experiments demonstrate that UDITSR outperforms SOTA baselines both in search and recommendation tasks.
arXiv Detail & Related papers (2024-07-01T02:36:03Z)
Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning [50.1035273069458]
Spoken language understanding (SLU) is a core task in task-oriented dialogue systems. We propose a multi-level MMCL framework to apply contrastive learning at three levels, including utterance level, slot level, and word level. Our framework achieves new state-of-the-art results on two public multi-intent SLU datasets.
arXiv Detail & Related papers (2024-05-31T14:34:23Z)
A Two-Stage Prediction-Aware Contrastive Learning Framework for Multi-Intent NLU [41.45522079026888]
Multi-intent natural language understanding (NLU) presents a formidable challenge due to the model confusion arising from multiple intents within a single utterance. Previous works train the model contrastively to increase the margin between different multi-intent labels. We introduce a two-stage Prediction-Aware Contrastive Learning framework for multi-intent NLU.
arXiv Detail & Related papers (2024-05-05T13:09:55Z)
Visual Commonsense based Heterogeneous Graph Contrastive Learning [79.22206720896664]
We propose a heterogeneous graph contrastive learning method to better finish the visual reasoning task. Our method is designed as a plug-and-play way, so that it can be quickly and easily combined with a wide range of representative methods.
arXiv Detail & Related papers (2023-11-11T12:01:18Z)
A Unified Framework for Multi-intent Spoken Language Understanding with prompting [14.17726194025463]
We describe a Prompt-based Spoken Language Understanding (PromptSLU) framework, to intuitively unify two sub-tasks into the same form. In detail, ID and SF are completed by concisely filling the utterance into task-specific prompt templates as input, and sharing output formats of key-value pairs sequence. Experiment results show that our framework outperforms several state-of-the-art baselines on two public datasets.
arXiv Detail & Related papers (2022-10-07T05:58:05Z)
MaPLe: Multi-modal Prompt Learning [54.96069171726668]
We propose Multi-modal Prompt Learning (MaPLe) for both vision and language branches to improve alignment between the vision and language representations. Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes.
arXiv Detail & Related papers (2022-10-06T17:59:56Z)
MIntRec: A New Dataset for Multimodal Intent Recognition [18.45381778273715]
Multimodal intent recognition is a significant task for understanding human language in real-world multimodal scenes. This paper introduces a novel dataset for multimodal intent recognition (MIntRec) to address this issue. It formulates coarse-grained and fine-grained intent based on the data collected from the TV series Superstore.
arXiv Detail & Related papers (2022-09-09T15:37:39Z)
Intent Contrastive Learning for Sequential Recommendation [86.54439927038968]
We introduce a latent variable to represent users' intents and learn the distribution function of the latent variable via clustering. We propose to leverage the learned intents into SR models via contrastive SSL, which maximizes the agreement between a view of sequence and its corresponding intent. Experiments conducted on four real-world datasets demonstrate the superiority of the proposed learning paradigm.
arXiv Detail & Related papers (2022-02-05T09:24:13Z)
AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling [69.59096090788125]
In this paper, we propose an Adaptive Graph-Interactive Framework (AGIF) for joint multiple intent detection and slot filling. We introduce an intent-slot graph interaction layer to model the strong correlation between the slot and intents. Such an interaction layer is applied to each token adaptively, which has the advantage to automatically extract the relevant intents information.
arXiv Detail & Related papers (2020-04-21T15:07:34Z)
Object Relational Graph with Teacher-Recommended Learning for Video Captioning [92.48299156867664]
We propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.
arXiv Detail & Related papers (2020-02-26T15:34:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.