Hashtag-Guided Low-Resource Tweet Classification
- URL: http://arxiv.org/abs/2302.10143v1
- Date: Mon, 20 Feb 2023 18:21:02 GMT
- Title: Hashtag-Guided Low-Resource Tweet Classification
- Authors: Shizhe Diao, Sedrick Scott Keh, Liangming Pan, Zhiliang Tian, Yan
Song, Tong Zhang
- Abstract summary: We propose a novel Hashtag-guided Tweet Classification model (HashTation)
HashTation automatically generates meaningful hashtags for the input tweet to provide useful auxiliary signals for tweet classification.
Experiments show that HashTation achieves significant improvements on seven low-resource tweet classification tasks.
- Score: 31.810562621519804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media classification tasks (e.g., tweet sentiment analysis, tweet
stance detection) are challenging because social media posts are typically
short, informal, and ambiguous. Thus, training on tweets is challenging and
demands large-scale human-annotated labels, which are time-consuming and costly
to obtain. In this paper, we find that providing hashtags to social media
tweets can help alleviate this issue because hashtags can enrich short and
ambiguous tweets in terms of various information, such as topic, sentiment, and
stance. This motivates us to propose a novel Hashtag-guided Tweet
Classification model (HashTation), which automatically generates meaningful
hashtags for the input tweet to provide useful auxiliary signals for tweet
classification. To generate high-quality and insightful hashtags, our hashtag
generation model retrieves and encodes the post-level and entity-level
information across the whole corpus. Experiments show that HashTation achieves
significant improvements on seven low-resource tweet classification tasks, in
which only a limited amount of training data is provided, showing that
automatically enriching tweets with model-generated hashtags could
significantly reduce the demand for large-scale human-labeled data. Further
analysis demonstrates that HashTation is able to generate high-quality hashtags
that are consistent with the tweets and their labels. The code is available at
https://github.com/shizhediao/HashTation.
Related papers
- RIGHT: Retrieval-augmented Generation for Mainstream Hashtag
Recommendation [76.24205422163169]
We propose RetrIeval-augmented Generative Mainstream HashTag Recommender (RIGHT)
RIGHT consists of three components: 1) a retriever seeks relevant hashtags from the entire tweet-hashtags set; 2) a selector enhances mainstream identification by introducing global signals; and 3) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags.
Our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%.
arXiv Detail & Related papers (2023-12-16T14:47:03Z) - Analyzing Trendy Twitter Hashtags in the 2022 French Election [0.0]
We propose a method for using semantic networks as user-level features for machine learning tasks.
We conducted an experiment using a semantic network of 1037 Twitter hashtags from a corpus of 3.7 million tweets related to the 2022 French presidential election.
Our semantic feature performs well with the regression with most emotions having $R2$ above 0.5.
arXiv Detail & Related papers (2023-10-11T15:17:55Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - HashSet -- A Dataset For Hashtag Segmentation [19.016545782774003]
We argue that model performance should be assessed on a wider variety of hashtags.
We propose HashSet, a dataset comprising of: a) 1.9k manually annotated dataset; b) 3.3M loosely supervised dataset.
We show that the performance of SOTA models for Hashtag drops substantially on proposed dataset.
arXiv Detail & Related papers (2022-01-18T04:40:45Z) - Attend and Select: A Segment Attention based Selection Mechanism for
Microblog Hashtag Generation [69.73215951112452]
A hashtag is formed by tokens or phrases that may originate from various fragmentary segments of the original text.
We propose an end-to-end Transformer-based generation model which consists of three phases: encoding, segments-selection, and decoding.
We introduce two large-scale hashtag generation datasets, which are newly collected from Chinese Weibo and English Twitter.
arXiv Detail & Related papers (2021-06-06T15:13:58Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Hit ratio: An Evaluation Metric for Hashtag Recommendation [6.746400031322727]
We propose a new metric which we call hit ratio for hashtag recommendation.
Most of the research in the area of hashtag recommendation have used classical metrics such as hit rate, precision, recall, and F1-score.
A comparison of hit ratio with the classical evaluation metrics reveals their limitations.
arXiv Detail & Related papers (2020-10-03T02:07:41Z) - On Identifying Hashtags in Disaster Twitter Data [55.17975121160699]
We construct a unique dataset of disaster-related tweets annotated with hashtags useful for filtering actionable information.
Using this dataset, we investigate Long Short Term Memory-based models within a Multi-Task Learning framework.
The best performing model achieves an F1-score as high as 92.22%.
arXiv Detail & Related papers (2020-01-05T22:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.