Attend and Select: A Segment Attention based Selection Mechanism for
Microblog Hashtag Generation
- URL: http://arxiv.org/abs/2106.03151v1
- Date: Sun, 6 Jun 2021 15:13:58 GMT
- Title: Attend and Select: A Segment Attention based Selection Mechanism for
Microblog Hashtag Generation
- Authors: Qianren Mao, Xi Li, Hao Peng, Bang Liu, Shu Guo, Jianxin Li, Lihong
Wang, Philip S. Yu
- Abstract summary: A hashtag is formed by tokens or phrases that may originate from various fragmentary segments of the original text.
We propose an end-to-end Transformer-based generation model which consists of three phases: encoding, segments-selection, and decoding.
We introduce two large-scale hashtag generation datasets, which are newly collected from Chinese Weibo and English Twitter.
- Score: 69.73215951112452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic microblog hashtag generation can help us better and faster
understand or process the critical content of microblog posts.
Conventional sequence-to-sequence generation methods can produce phrase-level
hashtags and have achieved remarkable performance on this task. However, they
are incapable of filtering out secondary information and not good at capturing
the discontinuous semantics among crucial tokens.
A hashtag is formed by tokens or phrases that may originate from various
fragmentary segments of the original text.
In this work, we propose an end-to-end Transformer-based generation model
which consists of three phases: encoding, segments-selection, and decoding. The
model transforms discontinuous semantic segments from the source text into a
sequence of hashtags.
Specifically, we introduce a novel Segments Selection Mechanism (SSM) for
Transformer to obtain segmental representations tailored to phrase-level
hashtag generation.
Besides, we introduce two large-scale hashtag generation datasets, which are
newly collected from Chinese Weibo and English Twitter.
Extensive evaluations on the two datasets reveal our approach's superiority
with significant improvements to extraction and generation baselines. The code
and datasets are available at \url{https://github.com/OpenSUM/HashtagGen}.
Related papers
- Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels.
The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation.
Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z) - RIGHT: Retrieval-augmented Generation for Mainstream Hashtag
Recommendation [76.24205422163169]
We propose RetrIeval-augmented Generative Mainstream HashTag Recommender (RIGHT)
RIGHT consists of three components: 1) a retriever seeks relevant hashtags from the entire tweet-hashtags set; 2) a selector enhances mainstream identification by introducing global signals; and 3) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags.
Our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%.
arXiv Detail & Related papers (2023-12-16T14:47:03Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - Hashtag-Guided Low-Resource Tweet Classification [31.810562621519804]
We propose a novel Hashtag-guided Tweet Classification model (HashTation)
HashTation automatically generates meaningful hashtags for the input tweet to provide useful auxiliary signals for tweet classification.
Experiments show that HashTation achieves significant improvements on seven low-resource tweet classification tasks.
arXiv Detail & Related papers (2023-02-20T18:21:02Z) - HashSet -- A Dataset For Hashtag Segmentation [19.016545782774003]
We argue that model performance should be assessed on a wider variety of hashtags.
We propose HashSet, a dataset comprising of: a) 1.9k manually annotated dataset; b) 3.3M loosely supervised dataset.
We show that the performance of SOTA models for Hashtag drops substantially on proposed dataset.
arXiv Detail & Related papers (2022-01-18T04:40:45Z) - Towards Document-Level Paraphrase Generation with Sentence Rewriting and
Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation.
We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence.
Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z) - News Meets Microblog: Hashtag Annotation via Retriever-Generator [15.558878116343585]
We propose to leverage news articles published before the microblog post to generate hashtags following a Retriever-Generator framework.
Experiments on English Twitter datasets demonstrate superior performance and significant advantages of leveraging news articles to generate hashtags.
arXiv Detail & Related papers (2021-04-18T05:28:13Z) - MART: Memory-Augmented Recurrent Transformer for Coherent Video
Paragraph Captioning [128.36951818335046]
We propose a new approach called Memory-Augmented Recurrent Transformer (MART)
MART uses a memory module to augment the transformer architecture.
MART generates more coherent and less repetitive paragraph captions than baseline methods.
arXiv Detail & Related papers (2020-05-11T20:01:41Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.