Text-to-hashtag Generation using Seq2seq Learning
- URL: http://arxiv.org/abs/2102.00904v1
- Date: Mon, 1 Feb 2021 15:28:27 GMT
- Title: Text-to-hashtag Generation using Seq2seq Learning
- Authors: Augusto Camargo, Wesley Carvalho, Felipe Peressim
- Abstract summary: We studied if models based on BiLSTM and BERT can generate hashtags in Brazilian portuguese that can be used in websites.
We processed a corpus of reviews and titles of products as inputs and we generated hashtags as outputs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we studied if models based on BiLSTM and BERT can generate
hashtags in Brazilian portuguese that can be used in Ecommerce websites. We
processed a corpus of Ecommerce reviews and titles of products as inputs and we
generated hashtags as outputs. We evaluate the results using four quantitatives
metrics: NIST, BLEU, METEOR and a crowdsourced score. Word Cloud was used as a
qualitative metric. Besides all computer metered metrics (NIST, BLEU and
METEOR) showed bad results, the crowdsourced showed amazing scores. We
concluded that the texts generated by the neural networks are very promising to
be used as hashtags of products in Ecommerce websites [1]. The code for this
work is available on https://github.com/augustocamargo/text-to-hashtag
Related papers
- Batching BPE Tokenization Merges [55.2480439325792]
BatchBPE is an open-source pure Python implementation of the Byte Pair algorithm.
It is used to train a high quality tokenizer on a basic laptop.
arXiv Detail & Related papers (2024-08-05T09:37:21Z) - RIGHT: Retrieval-augmented Generation for Mainstream Hashtag
Recommendation [76.24205422163169]
We propose RetrIeval-augmented Generative Mainstream HashTag Recommender (RIGHT)
RIGHT consists of three components: 1) a retriever seeks relevant hashtags from the entire tweet-hashtags set; 2) a selector enhances mainstream identification by introducing global signals; and 3) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags.
Our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%.
arXiv Detail & Related papers (2023-12-16T14:47:03Z) - Offensive Language Identification in Transliterated and Code-Mixed
Bangla [29.30985521838655]
In this paper, we explore offensive language identification in texts with transliterations and code-mixing.
We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments.
We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset.
arXiv Detail & Related papers (2023-11-25T13:27:22Z) - Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs.
In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations.
We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - #REVAL: a semantic evaluation framework for hashtag recommendation [6.746400031322727]
We propose a novel semantic evaluation framework for hashtag recommendation, called #REval.
#REval includes an internal module referred to as BERTag, which automatically learns the hashtag embeddings.
Our experiments on three large datasets show that #REval gave more meaningful hashtag synonyms for hashtag recommendation evaluation.
arXiv Detail & Related papers (2023-05-24T07:10:56Z) - T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics [94.69907794006826]
We present a framework that combines the best of both worlds, using both supervised and unsupervised signals from whatever data we have available.
We operationalize this idea by training T5Score, a metric that uses these training signals with mT5 as the backbone.
T5Score achieves the best performance on all datasets against existing top-scoring metrics at the segment level.
arXiv Detail & Related papers (2022-12-12T06:29:04Z) - L3Cube-HingCorpus and HingBERT: A Code Mixed Hindi-English Dataset and
BERT Language Models [1.14219428942199]
We present L3Cube-HingCorpus, the first large-scale real Hindi-English code mixed data in a Roman script.
We show the effectiveness of these BERT models on the subsequent downstream tasks like code-mixed sentiment analysis, POS tagging, NER, and LID from the GLUECoS benchmark.
arXiv Detail & Related papers (2022-04-18T16:49:59Z) - Product Market Demand Analysis Using NLP in Banglish Text with Sentiment
Analysis and Named Entity Recognition [0.0]
There are roughly 228 million native Bengali speakers.
Consumers are buying and evaluating items on social media with Banglish text.
People use social media to find preferred smartphone brands and models.
arXiv Detail & Related papers (2022-04-04T20:21:31Z) - Attend and Select: A Segment Attention based Selection Mechanism for
Microblog Hashtag Generation [69.73215951112452]
A hashtag is formed by tokens or phrases that may originate from various fragmentary segments of the original text.
We propose an end-to-end Transformer-based generation model which consists of three phases: encoding, segments-selection, and decoding.
We introduce two large-scale hashtag generation datasets, which are newly collected from Chinese Weibo and English Twitter.
arXiv Detail & Related papers (2021-06-06T15:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.