News Meets Microblog: Hashtag Annotation via Retriever-Generator
- URL: http://arxiv.org/abs/2104.08723v1
- Date: Sun, 18 Apr 2021 05:28:13 GMT
- Title: News Meets Microblog: Hashtag Annotation via Retriever-Generator
- Authors: Xiuwen Zheng, Dheeraj Mekala, Amarnath Gupta, Jingbo Shang
- Abstract summary: We propose to leverage news articles published before the microblog post to generate hashtags following a Retriever-Generator framework.
Experiments on English Twitter datasets demonstrate superior performance and significant advantages of leveraging news articles to generate hashtags.
- Score: 15.558878116343585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hashtag annotation for microblog posts has been recently formulated as a
sequence generation problem to handle emerging hashtags that are unseen in the
training set. The state-of-the-art method leverages conversations initiated by
posts to enrich contextual information for the short posts. However, it is
unrealistic to assume the existence of conversations before the hashtag
annotation itself. Therefore, we propose to leverage news articles published
before the microblog post to generate hashtags following a Retriever-Generator
framework. Extensive experiments on English Twitter datasets demonstrate
superior performance and significant advantages of leveraging news articles to
generate hashtags.
Related papers
- RIGHT: Retrieval-augmented Generation for Mainstream Hashtag
Recommendation [76.24205422163169]
We propose RetrIeval-augmented Generative Mainstream HashTag Recommender (RIGHT)
RIGHT consists of three components: 1) a retriever seeks relevant hashtags from the entire tweet-hashtags set; 2) a selector enhances mainstream identification by introducing global signals; and 3) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags.
Our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%.
arXiv Detail & Related papers (2023-12-16T14:47:03Z) - Data Augmentation for Low-Resource Keyphrase Generation [46.52115499306222]
Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases)
Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire.
We present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains.
arXiv Detail & Related papers (2023-05-29T09:20:34Z) - Hashtag-Guided Low-Resource Tweet Classification [31.810562621519804]
We propose a novel Hashtag-guided Tweet Classification model (HashTation)
HashTation automatically generates meaningful hashtags for the input tweet to provide useful auxiliary signals for tweet classification.
Experiments show that HashTation achieves significant improvements on seven low-resource tweet classification tasks.
arXiv Detail & Related papers (2023-02-20T18:21:02Z) - Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training [66.64843711515341]
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text.
We call attention to a new setting named multilingual keyphrase generation.
We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
arXiv Detail & Related papers (2022-05-21T00:45:21Z) - HashSet -- A Dataset For Hashtag Segmentation [19.016545782774003]
We argue that model performance should be assessed on a wider variety of hashtags.
We propose HashSet, a dataset comprising of: a) 1.9k manually annotated dataset; b) 3.3M loosely supervised dataset.
We show that the performance of SOTA models for Hashtag drops substantially on proposed dataset.
arXiv Detail & Related papers (2022-01-18T04:40:45Z) - Attend and Select: A Segment Attention based Selection Mechanism for
Microblog Hashtag Generation [69.73215951112452]
A hashtag is formed by tokens or phrases that may originate from various fragmentary segments of the original text.
We propose an end-to-end Transformer-based generation model which consists of three phases: encoding, segments-selection, and decoding.
We introduce two large-scale hashtag generation datasets, which are newly collected from Chinese Weibo and English Twitter.
arXiv Detail & Related papers (2021-06-06T15:13:58Z) - Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News
Multi-Headline Generation [98.98411895250774]
We propose generating multiple headlines with keyphrases of user interests.
The proposed method achieves state-of-the-art results in terms of quality and diversity.
arXiv Detail & Related papers (2020-04-08T08:30:05Z) - On Identifying Hashtags in Disaster Twitter Data [55.17975121160699]
We construct a unique dataset of disaster-related tweets annotated with hashtags useful for filtering actionable information.
Using this dataset, we investigate Long Short Term Memory-based models within a Multi-Task Learning framework.
The best performing model achieves an F1-score as high as 92.22%.
arXiv Detail & Related papers (2020-01-05T22:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.