Related papers: Efficacy of BERT embeddings on predicting disaster from Twitter data

Efficacy of BERT embeddings on predicting disaster from Twitter data

URL: http://arxiv.org/abs/2108.10698v1
Date: Sun, 8 Aug 2021 17:44:29 GMT
Title: Efficacy of BERT embeddings on predicting disaster from Twitter data
Authors: Ashis Kumar Chanda
Abstract summary: Rescue agencies monitor social media to identify disasters and reduce the risk of lives. It is impossible for humans to manually check the mass amount of data and identify disasters in real-time. Advanced contextual embedding method (BERT) constructs different vectors for the same word in different contexts. BERT embeddings have the best results in disaster prediction task than the traditional word embeddings.
Score: 0.548253258922555
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Social media like Twitter provide a common platform to share and communicate personal experiences with other people. People often post their life experiences, local news, and events on social media to inform others. Many rescue agencies monitor this type of data regularly to identify disasters and reduce the risk of lives. However, it is impossible for humans to manually check the mass amount of data and identify disasters in real-time. For this purpose, many research works have been proposed to present words in machine-understandable representations and apply machine learning methods on the word representations to identify the sentiment of a text. The previous research methods provide a single representation or embedding of a word from a given document. However, the recent advanced contextual embedding method (BERT) constructs different vectors for the same word in different contexts. BERT embeddings have been successfully used in different natural language processing (NLP) tasks, yet there is no concrete analysis of how these representations are helpful in disaster-type tweet analysis. In this research work, we explore the efficacy of BERT embeddings on predicting disaster from Twitter data and compare these to traditional context-free word embedding methods (GloVe, Skip-gram, and FastText). We use both traditional machine learning methods and deep learning methods for this purpose. We provide both quantitative and qualitative results for this study. The results show that the BERT embeddings have the best results in disaster prediction task than the traditional word embeddings. Our codes are made freely accessible to the research community.

Related papers

Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
Word Sense Induction with Knowledge Distillation from BERT [6.88247391730482]
This paper proposes a method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context. Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings.
arXiv Detail & Related papers (2023-04-20T21:05:35Z)
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification. Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z)
Paraphrase Identification with Deep Learning: A Review of Datasets and Methods [1.4325734372991794]
We investigate how the under-representation of certain paraphrase types in popular datasets affects the ability to detect plagiarism. We introduce and validate a new refined typology for paraphrases. We propose new directions for future research and dataset development to enhance AI-based paraphrase detection.
arXiv Detail & Related papers (2022-12-13T23:06:20Z)
A Case Study to Reveal if an Area of Interest has a Trend in Ongoing Tweets Using Word and Sentence Embeddings [0.0]
We have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores show the similarity between the daily tweet corpus and the target words. The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings. We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results.
arXiv Detail & Related papers (2021-10-02T18:44:55Z)
TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study [3.9424051088220518]
We propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records. We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain.
arXiv Detail & Related papers (2021-05-20T09:57:45Z)
Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change. We show that our method can be used for the detection of semantic change with any alignment method. We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z)
MuSeM: Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching [7.608480381965392]
Measuring the congruence between two texts has several useful applications, such as detecting deceptive and misleading news headlines on the web. This paper proposes a method that uses inter-mutual attention-based semantic matching between the original and synthetically generated headlines. We observe that the proposed method outperforms prior arts significantly for two publicly available datasets.
arXiv Detail & Related papers (2020-10-07T19:19:42Z)
Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks. Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it. In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.