Sticker820K: Empowering Interactive Retrieval with Stickers
- URL: http://arxiv.org/abs/2306.06870v1
- Date: Mon, 12 Jun 2023 05:06:53 GMT
- Title: Sticker820K: Empowering Interactive Retrieval with Stickers
- Authors: Sijie Zhao, Yixiao Ge, Zhongang Qi, Lin Song, Xiaohan Ding, Zehua Xie,
Ying Shan
- Abstract summary: We propose a large-scale Chinese sticker dataset, namely Sticker820K, which consists of 820k image-text pairs.
Each sticker has rich and high-quality textual annotations, including descriptions, optical characters, emotional labels, and style classifications.
For the text-to-image retrieval task, our StickerCLIP demonstrates strong superiority over the CLIP, which achieves an absolute gain of 66.0% in mean recall.
- Score: 34.67442172774095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stickers have become a ubiquitous part of modern-day communication, conveying
complex emotions through visual imagery. To facilitate the development of more
powerful algorithms for analyzing stickers, we propose a large-scale Chinese
sticker dataset, namely Sticker820K, which consists of 820k image-text pairs.
Each sticker has rich and high-quality textual annotations, including
descriptions, optical characters, emotional labels, and style classifications.
Although vision-language tasks in the domain of natural images have been well
studied, directly applying the those models, such as CLIP, to sticker data is
not an optimal solution due to the discrepant nature between natural and
emotive image data. Therefore, we propose StickerCLIP as a benchmark model on
the Sticker820K dataset. For the text-to-image retrieval task, our StickerCLIP
demonstrates strong superiority over the CLIP, which achieves an absolute gain
of 66.0\% in mean recall on the Sticker820K test set. Additionally, we endeavor
to extend the recently popularized LLM by means of prompt tuning, integrating
its ability for sticker retrieval and allowing users to retrieve stickers
through instructions. We validate the feasibility of this method, demonstrating
the immense potential of prompt tuning in expanding LLM abilities while not
affecting the quality of upstream tasks.
Related papers
- PerSRV: Personalized Sticker Retrieval with Vision-Language Model [21.279568613306573]
We propose the Personalized Sticker Retrieval with Vision-Language Model framework, namely PerSRV, structured into offline calculations and online processing modules.
For sticker-level semantic understanding, we supervised fine-tuned LLaVA-1.5-7B to generate human-like sticker semantics.
Thirdly, we cluster style centroids based on users' historical interactions to achieve personal preference modeling.
arXiv Detail & Related papers (2024-10-29T07:13:47Z) - Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline [4.375392069380812]
We propose a new task: Multimodal chat Sentiment Analysis and Intent Recognition involving Stickers (MSAIRS)
We introduce a novel multimodal dataset containing Chinese chat records and stickers excerpted from several mainstream social media platforms.
Our dataset and code will be publicly available.
arXiv Detail & Related papers (2024-05-14T08:42:49Z) - Distilling Self-Supervised Vision Transformers for Weakly-Supervised
Few-Shot Classification & Segmentation [58.03255076119459]
We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT)
Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions.
Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings.
arXiv Detail & Related papers (2023-07-07T06:16:43Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - Selecting Stickers in Open-Domain Dialogue through Multitask Learning [51.67855506570727]
We propose a multitask learning method comprised of three auxiliary tasks to enhance the understanding of dialogue history, emotion and semantic meaning of stickers.
Our model can better combine the multimodal information and achieve significantly higher accuracy over strong baselines.
arXiv Detail & Related papers (2022-09-16T03:45:22Z) - Learning to Respond with Your Favorite Stickers: A Framework of Unifying
Multi-Modality and User Preference in Multi-Turn Dialog [67.91114640314004]
Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps.
Some works are dedicated to automatically select sticker response by matching the stickers image with previous utterances.
We propose to recommend an appropriate sticker to user based on multi-turn dialog context and sticker using history of user.
arXiv Detail & Related papers (2020-11-05T03:31:17Z) - Learning to Respond with Stickers: A Framework of Unifying
Multi-Modality in Multi-Turn Dialog [65.7021675527543]
Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps.
Some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances.
We propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels.
arXiv Detail & Related papers (2020-03-10T13:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.