Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training
- URL: http://arxiv.org/abs/2409.14552v2
- Date: Thu, 26 Sep 2024 02:02:13 GMT
- Title: Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training
- Authors: Zhou Zhang, Dongzeng Tan, Jiaan Wang, Yilong Chen, Jiarong Xu,
- Abstract summary: We release the emoji's power in social media data mining.
We propose a graph pre-train framework for text and emoji co-modeling.
- Score: 22.452853652070413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emojis have gained immense popularity on social platforms, serving as a common means to supplement or replace text. However, existing data mining approaches generally either completely ignore or simply treat emojis as ordinary Unicode characters, which may limit the model's ability to grasp the rich semantic information in emojis and the interaction between emojis and texts. Thus, it is necessary to release the emoji's power in social media data mining. To this end, we first construct a heterogeneous graph consisting of three types of nodes, i.e. post, word and emoji nodes to improve the representation of different elements in posts. The edges are also well-defined to model how these three elements interact with each other. To facilitate the sharing of information among post, word and emoji nodes, we propose a graph pre-train framework for text and emoji co-modeling, which contains two graph pre-training tasks: node-level graph contrastive learning and edge-level link reconstruction learning. Extensive experiments on the Xiaohongshu and Twitter datasets with two types of downstream tasks demonstrate that our approach proves significant improvement over previous strong baseline methods.
Related papers
- Semantics Preserving Emoji Recommendation with Large Language Models [47.94761630160614]
Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text.
We propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text.
arXiv Detail & Related papers (2024-09-16T22:27:46Z) - EmojiLM: Modeling the New Emoji Language [44.23076273155259]
We develop a text-emoji parallel corpus, Text2Emoji, from a large language model.
Based on the parallel corpus, we distill a sequence-to-sequence model, EmojiLM, which is specialized in the text-emoji bidirectional translation.
Our proposed model outperforms strong baselines and the parallel corpus benefits emoji-related downstream tasks.
arXiv Detail & Related papers (2023-11-03T07:06:51Z) - A Federated Approach to Predicting Emojis in Hindi Tweets [1.979158763744267]
We introduce a new dataset of $118$k tweets (augmented from $25$k unique tweets) for emoji prediction in Hindi.
We propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy.
arXiv Detail & Related papers (2022-11-11T18:37:33Z) - Emojich -- zero-shot emoji generation using Russian language: a
technical report [52.77024349608834]
"Emojich" is a text-to-image neural network that generates emojis using captions in Russian language as a condition.
We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage.
arXiv Detail & Related papers (2021-12-04T23:37:32Z) - Emoji-aware Co-attention Network with EmoGraph2vec Model for Sentiment
Anaylsis [9.447106020795292]
We propose a method to learn emoji representations called EmoGraph2vec and design an emoji-aware co-attention network.
Our model designs a co-attention mechanism to incorporate the text and emojis, and integrates a squeeze-and-excitation block into a convolutional neural network.
Experimental results show that the proposed model can outperform several baselines for sentiment analysis on benchmark datasets.
arXiv Detail & Related papers (2021-10-27T08:01:10Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Semantic Journeys: Quantifying Change in Emoji Meaning from 2012-2018 [66.28665205489845]
We offer the first longitudinal study of how emoji semantics changes over time, applying techniques from computational linguistics to six years of Twitter data.
We identify five patterns in emoji semantic development and find evidence that the less abstract an emoji is, the more likely it is to undergo semantic change.
To aid future work on emoji and semantics, we make our data publicly available along with a web-based interface that anyone can use to explore semantic change in emoji.
arXiv Detail & Related papers (2021-05-03T13:35:10Z) - A `Sourceful' Twist: Emoji Prediction Based on Sentiment, Hashtags and
Application Source [1.6818451361240172]
We showcase the importance of using Twitter features to help the model understand the sentiment involved and hence to predict the most suitable emoji for the text.
Our data analysis and neural network model performance evaluations depict that using hashtags and application sources as features allows to encode different information and is effective in emoji prediction.
arXiv Detail & Related papers (2021-03-14T03:05:04Z) - Emoji Prediction: Extensions and Benchmarking [30.642840676899734]
The emoji prediction task aims at predicting the proper set of emojis associated with a piece of text.
We extend the existing setting of the emoji prediction task to include a richer set of emojis and to allow multi-label classification.
We propose novel models for multi-class and multi-label emoji prediction based on Transformer networks.
arXiv Detail & Related papers (2020-07-14T22:41:20Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.