Learning to Respond with Stickers: A Framework of Unifying
Multi-Modality in Multi-Turn Dialog
- URL: http://arxiv.org/abs/2003.04679v1
- Date: Tue, 10 Mar 2020 13:10:26 GMT
- Title: Learning to Respond with Stickers: A Framework of Unifying
Multi-Modality in Multi-Turn Dialog
- Authors: Shen Gao, Xiuying Chen, Chang Liu, Li Liu, Dongyan Zhao and Rui Yan
- Abstract summary: Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps.
Some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances.
We propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels.
- Score: 65.7021675527543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stickers with vivid and engaging expressions are becoming increasingly
popular in online messaging apps, and some works are dedicated to automatically
select sticker response by matching text labels of stickers with previous
utterances. However, due to their large quantities, it is impractical to
require text labels for the all stickers. Hence, in this paper, we propose to
recommend an appropriate sticker to user based on multi-turn dialog context
history without any external labels. Two main challenges are confronted in this
task. One is to learn semantic meaning of stickers without corresponding text
labels. Another challenge is to jointly model the candidate sticker with the
multi-turn dialog context. To tackle these challenges, we propose a sticker
response selector (SRS) model. Specifically, SRS first employs a convolutional
based sticker image encoder and a self-attention based multi-turn dialog
encoder to obtain the representation of stickers and utterances. Next, deep
interaction network is proposed to conduct deep matching between the sticker
with each utterance in the dialog history. SRS then learns the short-term and
long-term dependency between all interaction results by a fusion network to
output the the final matching score. To evaluate our proposed method, we
collect a large-scale real-world dialog dataset with stickers from one of the
most popular online chatting platform. Extensive experiments conducted on this
dataset show that our model achieves the state-of-the-art performance for all
commonly-used metrics. Experiments also verify the effectiveness of each
component of SRS. To facilitate further research in sticker selection field, we
release this dataset of 340K multi-turn dialog and sticker pairs.
Related papers
- PerSRV: Personalized Sticker Retrieval with Vision-Language Model [21.279568613306573]
We propose the Personalized Sticker Retrieval with Vision-Language Model framework, namely PerSRV, structured into offline calculations and online processing modules.
For sticker-level semantic understanding, we supervised fine-tuned LLaVA-1.5-7B to generate human-like sticker semantics.
Thirdly, we cluster style centroids based on users' historical interactions to achieve personal preference modeling.
arXiv Detail & Related papers (2024-10-29T07:13:47Z) - Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline [4.375392069380812]
We propose a new task: Multimodal chat Sentiment Analysis and Intent Recognition involving Stickers (MSAIRS)
We introduce a novel multimodal dataset containing Chinese chat records and stickers excerpted from several mainstream social media platforms.
Our dataset and code will be publicly available.
arXiv Detail & Related papers (2024-05-14T08:42:49Z) - Sticker820K: Empowering Interactive Retrieval with Stickers [34.67442172774095]
We propose a large-scale Chinese sticker dataset, namely Sticker820K, which consists of 820k image-text pairs.
Each sticker has rich and high-quality textual annotations, including descriptions, optical characters, emotional labels, and style classifications.
For the text-to-image retrieval task, our StickerCLIP demonstrates strong superiority over the CLIP, which achieves an absolute gain of 66.0% in mean recall.
arXiv Detail & Related papers (2023-06-12T05:06:53Z) - Selecting Stickers in Open-Domain Dialogue through Multitask Learning [51.67855506570727]
We propose a multitask learning method comprised of three auxiliary tasks to enhance the understanding of dialogue history, emotion and semantic meaning of stickers.
Our model can better combine the multimodal information and achieve significantly higher accuracy over strong baselines.
arXiv Detail & Related papers (2022-09-16T03:45:22Z) - SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for
Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora.
Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - Reasoning in Dialog: Improving Response Generation by Context Reading
Comprehension [49.92173751203827]
In multi-turn dialog, utterances do not always take the full form of sentences.
We propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question.
arXiv Detail & Related papers (2020-12-14T10:58:01Z) - Learning to Respond with Your Favorite Stickers: A Framework of Unifying
Multi-Modality and User Preference in Multi-Turn Dialog [67.91114640314004]
Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps.
Some works are dedicated to automatically select sticker response by matching the stickers image with previous utterances.
We propose to recommend an appropriate sticker to user based on multi-turn dialog context and sticker using history of user.
arXiv Detail & Related papers (2020-11-05T03:31:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.