A Case Study to Reveal if an Area of Interest has a Trend in Ongoing
Tweets Using Word and Sentence Embeddings
- URL: http://arxiv.org/abs/2110.00866v1
- Date: Sat, 2 Oct 2021 18:44:55 GMT
- Title: A Case Study to Reveal if an Area of Interest has a Trend in Ongoing
Tweets Using Word and Sentence Embeddings
- Authors: \.Ismail Aslan and Y\"ucel Top\c{c}u
- Abstract summary: We have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores show the similarity between the daily tweet corpus and the target words.
The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings.
We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of Natural Language Processing, information extraction from
texts has been the objective of many researchers for years. Many different
techniques have been applied in order to reveal the opinion that a tweet might
have, thus understanding the sentiment of the small writing up to 280
characters. Other than figuring out the sentiment of a tweet, a study can also
focus on finding the correlation of the tweets with a certain area of interest,
which constitutes the purpose of this study. In order to reveal if an area of
interest has a trend in ongoing tweets, we have proposed an easily applicable
automated methodology in which the Daily Mean Similarity Scores that show the
similarity between the daily tweet corpus and the target words representing our
area of interest is calculated by using a na\"ive correlation-based technique
without training any Machine Learning Model. The Daily Mean Similarity Scores
have mainly based on cosine similarity and word/sentence embeddings computed by
Multilanguage Universal Sentence Encoder and showed main opinion stream of the
tweets with respect to a certain area of interest, which proves that an ongoing
trend of a specific subject on Twitter can easily be captured in almost real
time by using the proposed methodology in this study. We have also compared the
effectiveness of using word versus sentence embeddings while applying our
methodology and realized that both give almost the same results, whereas using
word embeddings requires less computational time than sentence embeddings, thus
being more effective. This paper will start with an introduction followed by
the background information about the basics, then continue with the explanation
of the proposed methodology and later on finish by interpreting the results and
concluding the findings.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - DEIM: An effective deep encoding and interaction model for sentence
matching [0.0]
We propose a sentence matching method based on deep encoding and interaction to extract deep semantic information.
In the encoder layer,we refer to the information of another sentence in the process of encoding a single sentence, and later use a algorithm to fuse the information.
In the interaction layer, we use a bidirectional attention mechanism and a self-attention mechanism to obtain deep semantic information.
arXiv Detail & Related papers (2022-03-20T07:59:42Z) - Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs
for Semantic Sentence Embeddings [3.8073142980733]
We propose a language-independent approach to build large datasets of pairs of informal texts weakly similar.
We exploit Twitter's intrinsic powerful signals of relatedness: replies and quotes of tweets.
Our model learns classical Semantic Textual Similarity, but also excels on tasks where pairs of sentences are not exact paraphrases.
arXiv Detail & Related papers (2021-10-05T13:21:40Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem.
The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information.
Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - MuSeM: Detecting Incongruent News Headlines using Mutual Attentive
Semantic Matching [7.608480381965392]
Measuring the congruence between two texts has several useful applications, such as detecting deceptive and misleading news headlines on the web.
This paper proposes a method that uses inter-mutual attention-based semantic matching between the original and synthetically generated headlines.
We observe that the proposed method outperforms prior arts significantly for two publicly available datasets.
arXiv Detail & Related papers (2020-10-07T19:19:42Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.