Related papers: A Federated Approach to Predicting Emojis in Hindi Tweets

A Federated Approach to Predicting Emojis in Hindi Tweets

URL: http://arxiv.org/abs/2211.06401v1
Date: Fri, 11 Nov 2022 18:37:33 GMT
Title: A Federated Approach to Predicting Emojis in Hindi Tweets
Authors: Deep Gandhi and Jash Mehta and Nirali Parekh and Karan Waghela and Lynette D'Mello and Zeerak Talat
Abstract summary: We introduce a new dataset of $118$k tweets (augmented from $25$k unique tweets) for emoji prediction in Hindi. We propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy.
Score: 1.979158763744267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of emojis affords a visual modality to, often private, textual communication. The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis. Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches. However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage. In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people's data. We introduce a new dataset of $118$k tweets (augmented from $25$k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.

Related papers

The Prosody of Emojis [73.70220975424597]
This study examines how emojis influence prosodic realisation in speech and how listeners interpret prosodic cues to recover emoji meanings.<n>Unlike previous work, we directly link prosody and emoji by analysing actual human speech data, collected through structured but open-ended production and perception tasks.<n>Results show that speakers adapt their prosody based on emoji cues, listeners can often identify the intended emoji from prosodic variation alone, and greater semantic differences between emojis correspond to increased prosodic divergence.
arXiv Detail & Related papers (2025-08-01T11:24:12Z)
Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training [22.452853652070413]
We release the emoji's power in social media data mining. We propose a graph pre-train framework for text and emoji co-modeling.
arXiv Detail & Related papers (2024-09-22T18:29:10Z)
Semantics Preserving Emoji Recommendation with Large Language Models [47.94761630160614]
Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text. We propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text.
arXiv Detail & Related papers (2024-09-16T22:27:46Z)
NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [56.46355425175232]
We suggest sanitizing sensitive text using two common strategies used by humans.<n>We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.<n>Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites.
arXiv Detail & Related papers (2024-06-06T05:07:44Z)
EmojiLM: Modeling the New Emoji Language [44.23076273155259]
We develop a text-emoji parallel corpus, Text2Emoji, from a large language model. Based on the parallel corpus, we distill a sequence-to-sequence model, EmojiLM, which is specialized in the text-emoji bidirectional translation. Our proposed model outperforms strong baselines and the parallel corpus benefits emoji-related downstream tasks.
arXiv Detail & Related papers (2023-11-03T07:06:51Z)
Emoji Prediction in Tweets using BERT [0.0]
We propose a transformer-based approach for emoji prediction using BERT, a widely-used pre-trained language model. We fine-tuned BERT on a large corpus of text (tweets) containing both text and emojis to predict the most appropriate emoji for a given text. Our experimental results demonstrate that our approach outperforms several state-of-the-art models in predicting emojis with an accuracy of over 75 percent.
arXiv Detail & Related papers (2023-07-05T06:38:52Z)
Emojich -- zero-shot emoji generation using Russian language: a technical report [52.77024349608834]
"Emojich" is a text-to-image neural network that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage.
arXiv Detail & Related papers (2021-12-04T23:37:32Z)
Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. One promising data source to help monitor human behavior is daily smartphone usage. We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z)
Black or White but never neutral: How readers perceive identity from yellow or skin-toned emoji [90.14874935843544]
Recent work established a connection between expression of identity and emoji usage on social media. This work asks if, as with language, readers are sensitive to such acts of self-expression and use them to understand the identity of authors.
arXiv Detail & Related papers (2021-05-12T18:23:51Z)
Semantic Journeys: Quantifying Change in Emoji Meaning from 2012-2018 [66.28665205489845]
We offer the first longitudinal study of how emoji semantics changes over time, applying techniques from computational linguistics to six years of Twitter data. We identify five patterns in emoji semantic development and find evidence that the less abstract an emoji is, the more likely it is to undergo semantic change. To aid future work on emoji and semantics, we make our data publicly available along with a web-based interface that anyone can use to explore semantic change in emoji.
arXiv Detail & Related papers (2021-05-03T13:35:10Z)
A `Sourceful' Twist: Emoji Prediction Based on Sentiment, Hashtags and Application Source [1.6818451361240172]
We showcase the importance of using Twitter features to help the model understand the sentiment involved and hence to predict the most suitable emoji for the text. Our data analysis and neural network model performance evaluations depict that using hashtags and application sources as features allows to encode different information and is effective in emoji prediction.
arXiv Detail & Related papers (2021-03-14T03:05:04Z)
Emoji Prediction: Extensions and Benchmarking [30.642840676899734]
The emoji prediction task aims at predicting the proper set of emojis associated with a piece of text. We extend the existing setting of the emoji prediction task to include a richer set of emojis and to allow multi-label classification. We propose novel models for multi-class and multi-label emoji prediction based on Transformer networks.
arXiv Detail & Related papers (2020-07-14T22:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.