Learning to Respond with Your Favorite Stickers: A Framework of Unifying
Multi-Modality and User Preference in Multi-Turn Dialog
- URL: http://arxiv.org/abs/2011.03322v1
- Date: Thu, 5 Nov 2020 03:31:17 GMT
- Title: Learning to Respond with Your Favorite Stickers: A Framework of Unifying
Multi-Modality and User Preference in Multi-Turn Dialog
- Authors: Shen Gao, Xiuying Chen, Li Liu, Dongyan Zhao and Rui Yan
- Abstract summary: Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps.
Some works are dedicated to automatically select sticker response by matching the stickers image with previous utterances.
We propose to recommend an appropriate sticker to user based on multi-turn dialog context and sticker using history of user.
- Score: 67.91114640314004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stickers with vivid and engaging expressions are becoming increasingly
popular in online messaging apps, and some works are dedicated to automatically
select sticker response by matching the stickers image with previous
utterances. However, existing methods usually focus on measuring the matching
degree between the dialog context and sticker image, which ignores the user
preference of using stickers. Hence, in this paper, we propose to recommend an
appropriate sticker to user based on multi-turn dialog context and sticker
using history of user. Two main challenges are confronted in this task. One is
to model the sticker preference of user based on the previous sticker selection
history. Another challenge is to jointly fuse the user preference and the
matching between dialog context and candidate sticker into final prediction
making. To tackle these challenges, we propose a \emph{Preference Enhanced
Sticker Response Selector} (PESRS) model. Specifically, PESRS first employs a
convolutional based sticker image encoder and a self-attention based multi-turn
dialog encoder to obtain the representation of stickers and utterances. Next,
deep interaction network is proposed to conduct deep matching between the
sticker and each utterance. Then, we model the user preference by using the
recently selected stickers as input, and use a key-value memory network to
store the preference representation. PESRS then learns the short-term and
long-term dependency between all interaction results by a fusion network, and
dynamically fuse the user preference representation into the final sticker
selection prediction. Extensive experiments conducted on a large-scale
real-world dialog dataset show that our model achieves the state-of-the-art
performance for all commonly-used metrics. Experiments also verify the
effectiveness of each component of PESRS.
Related papers
- PerSRV: Personalized Sticker Retrieval with Vision-Language Model [21.279568613306573]
We propose the Personalized Sticker Retrieval with Vision-Language Model framework, namely PerSRV, structured into offline calculations and online processing modules.
For sticker-level semantic understanding, we supervised fine-tuned LLaVA-1.5-7B to generate human-like sticker semantics.
Thirdly, we cluster style centroids based on users' historical interactions to achieve personal preference modeling.
arXiv Detail & Related papers (2024-10-29T07:13:47Z) - Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline [4.375392069380812]
We propose a new task: Multimodal chat Sentiment Analysis and Intent Recognition involving Stickers (MSAIRS)
We introduce a novel multimodal dataset containing Chinese chat records and stickers excerpted from several mainstream social media platforms.
Our dataset and code will be publicly available.
arXiv Detail & Related papers (2024-05-14T08:42:49Z) - Sticker820K: Empowering Interactive Retrieval with Stickers [34.67442172774095]
We propose a large-scale Chinese sticker dataset, namely Sticker820K, which consists of 820k image-text pairs.
Each sticker has rich and high-quality textual annotations, including descriptions, optical characters, emotional labels, and style classifications.
For the text-to-image retrieval task, our StickerCLIP demonstrates strong superiority over the CLIP, which achieves an absolute gain of 66.0% in mean recall.
arXiv Detail & Related papers (2023-06-12T05:06:53Z) - Selecting Stickers in Open-Domain Dialogue through Multitask Learning [51.67855506570727]
We propose a multitask learning method comprised of three auxiliary tasks to enhance the understanding of dialogue history, emotion and semantic meaning of stickers.
Our model can better combine the multimodal information and achieve significantly higher accuracy over strong baselines.
arXiv Detail & Related papers (2022-09-16T03:45:22Z) - User-Centric Conversational Recommendation with Multi-Aspect User
Modeling [47.310579802092384]
We propose a User-Centric Conversational Recommendation (UCCR) model, which returns to the essence of user preference learning in CRS tasks.
A multi-view preference mapper is conducted to learn the intrinsic correlations among different views in current and historical sessions.
The learned multi-aspect multi-view user preferences are then used for the recommendation and dialogue generation.
arXiv Detail & Related papers (2022-04-20T07:08:46Z) - Knowledge-Enhanced Hierarchical Graph Transformer Network for
Multi-Behavior Recommendation [56.12499090935242]
This work proposes a Knowledge-Enhanced Hierarchical Graph Transformer Network (KHGT) to investigate multi-typed interactive patterns between users and items in recommender systems.
KHGT is built upon a graph-structured neural architecture to capture type-specific behavior characteristics.
We show that KHGT consistently outperforms many state-of-the-art recommendation methods across various evaluation settings.
arXiv Detail & Related papers (2021-10-08T09:44:00Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - Learning to Respond with Stickers: A Framework of Unifying
Multi-Modality in Multi-Turn Dialog [65.7021675527543]
Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps.
Some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances.
We propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels.
arXiv Detail & Related papers (2020-03-10T13:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.