A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media
- URL: http://arxiv.org/abs/2405.00903v1
- Date: Wed, 1 May 2024 23:19:49 GMT
- Title: A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media
- Authors: Ayaz Mehmood, Muhammad Tayyab Zamir, Muhammad Asif Ayub, Nasir Ahmad, Kashif Ahmad,
- Abstract summary: Social media content has been proven very effective in disaster informatics.
However, due to the unstructured nature of the data, several challenges are associated with disaster analysis in social media content.
To fully explore the potential of social media content in disaster informatics, access to relevant content and the correct geo-location information is very critical.
- Score: 1.9739821076317217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the last decade, similar to other application domains, social media content has been proven very effective in disaster informatics. However, due to the unstructured nature of the data, several challenges are associated with disaster analysis in social media content. To fully explore the potential of social media content in disaster informatics, access to relevant content and the correct geo-location information is very critical. In this paper, we propose a three-step solution to tackling these challenges. Firstly, the proposed solution aims to classify social media posts into relevant and irrelevant posts followed by the automatic extraction of location information from the posts' text through Named Entity Recognition (NER) analysis. Finally, to quickly analyze the topics covered in large volumes of social media posts, we perform topic modeling resulting in a list of top keywords, that highlight the issues discussed in the tweet. For the Relevant Classification of Twitter Posts (RCTP), we proposed a merit-based fusion framework combining the capabilities of four different models namely BERT, RoBERTa, Distil BERT, and ALBERT obtaining the highest F1-score of 0.933 on a benchmark dataset. For the Location Extraction from Twitter Text (LETT), we evaluated four models namely BERT, RoBERTa, Distil BERTA, and Electra in an NER framework obtaining the highest F1-score of 0.960. For topic modeling, we used the BERTopic library to discover the hidden topic patterns in the relevant tweets. The experimental results of all the components of the proposed end-to-end solution are very encouraging and hint at the potential of social media content and NLP in disaster management.
Related papers
- A Social Context-aware Graph-based Multimodal Attentive Learning Framework for Disaster Content Classification during Emergencies [0.0]
CrisisSpot is a method that captures complex relationships between textual and visual modalities.
IDEA captures both harmonious and contrasting patterns within the data to enhance multimodal interactions.
CrisisSpot achieved an average F1-score gain of 9.45% and 5.01% compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-10-11T13:51:46Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Large Language Models for Next Point-of-Interest Recommendation [53.93503291553005]
Location-Based Social Network (LBSN) data is often used for the next Point of Interest (POI) recommendation task.
One frequently disregarded challenge is how to effectively use the abundant contextual information present in LBSN data.
We propose a framework that uses pretrained Large Language Models (LLMs) to tackle this challenge.
arXiv Detail & Related papers (2024-04-19T13:28:36Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - IKDSumm: Incorporating Key-phrases into BERT for extractive Disaster
Tweet Summarization [5.299958874647294]
We propose a disaster-specific tweet summarization framework, IKDSumm.
IKDSumm identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet.
We utilize these key-phrases to automatically generate a summary of the tweets.
arXiv Detail & Related papers (2023-05-19T11:05:55Z) - TopoBERT: Plug and Play Toponym Recognition Module Harnessing Fine-tuned
BERT [11.446721140340575]
TopoBERT, a toponym recognition module based on a one dimensional Convolutional Neural Network (CNN1D) and Bidirectional Representation from Transformers (BERT), is proposed and fine-tuned.
TopoBERT achieves state-of-the-art performance compared to the other five baseline models and can be applied to diverse toponym recognition tasks without additional training.
arXiv Detail & Related papers (2023-01-31T13:44:34Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - T-BERT -- Model for Sentiment Analysis of Micro-blogs Integrating Topic
Model and BERT [0.0]
The effectiveness of BERT(Bidirectional Representations from Transformers) in sentiment classification tasks from a raw live dataset is demonstrated.
A novel T-BERT framework is proposed to show the enhanced performance obtainable by combining latent topics with contextual BERT embeddings.
arXiv Detail & Related papers (2021-06-02T12:01:47Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - I-AID: Identifying Actionable Information from Disaster-related Tweets [0.0]
Social media plays a significant role in disaster management by providing valuable data about affected people, donations and help requests.
We propose I-AID, a multimodel approach to automatically categorize tweets into multi-label information types.
Our results indicate that I-AID outperforms state-of-the-art approaches in terms of weighted average F1 score by +6% and +4% on the TREC-IS dataset and COVID-19 Tweets, respectively.
arXiv Detail & Related papers (2020-08-04T19:07:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.