QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact Assessment
- URL: http://arxiv.org/abs/2405.06684v1
- Date: Mon, 6 May 2024 10:52:21 GMT
- Title: QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact Assessment
- Authors: Jin Han, Zhe Zheng, Xin-Zheng Lu, Ke-Yin Chen, Jia-Rui Lin,
- Abstract summary: Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities.
This study proposes the first domain-specific large language model (LLM) and an integrated method for rapid earthquake impact assessment.
Results show that the proposed approach can effectively enhance the impact assessment process by accurate detection of noisy microblogs.
- Score: 7.777478408048141
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering their relationship to the physical and social impacts of earthquakes, and a dataset comprising 7282 earthquake-related microblogs from twenty earthquakes in different locations is developed as well. Then, with a systematic analysis of various influential factors, QuakeBERT, a domain-specific large language model (LLM), is developed and fine-tuned for accurate classification and filtering of microblogs. Meanwhile, an integrated method integrating public opinion trend analysis, sentiment analysis, and keyword-based physical impact quantification is introduced to assess both the physical and social impacts of earthquakes based on social media texts. Experiments show that data diversity and data volume dominate the performance of QuakeBERT and increase the macro average F1 score by 27%, while the best classification model QuakeBERT outperforms the CNN- or RNN-based models by improving the macro average F1 score from 60.87% to 84.33%. Finally, the proposed approach is applied to assess two earthquakes with the same magnitude and focal depth. Results show that the proposed approach can effectively enhance the impact assessment process by accurate detection of noisy microblogs, which enables effective post-disaster emergency responses to create more resilient cities.
Related papers
- Turkey's Earthquakes: Damage Prediction and Feature Significance Using A Multivariate Analysis [1.9461727843485295]
This research contributes to the reduction of fatalities in future seismic events in Turkey.
We tested various machine-learning architectures to forecast death tolls and fatalities per affected population.
Our findings indicate that the Random Forest model provides the most reliable predictions.
arXiv Detail & Related papers (2024-10-29T10:29:06Z) - Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text.
These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z) - Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses [76.59021017301127]
We propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports.
We further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes.
Our experiments results show that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes.
arXiv Detail & Related papers (2024-06-16T03:10:16Z) - Data-Driven Prediction of Seismic Intensity Distributions Featuring
Hybrid Classification-Regression Models [21.327960186900885]
This study develops linear regression models capable of predicting seismic intensity distributions based on earthquake parameters.
The dataset comprises seismic intensity data from earthquakes that occurred in the vicinity of Japan between 1997 and 2020.
The proposed model can predict even abnormal seismic intensity distributions, a task at conventional GMPEs often struggle.
arXiv Detail & Related papers (2024-02-03T13:39:22Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - Earthquake Impact Analysis Based on Text Mining and Social Media
Analytics [5.949779668853556]
Earthquakes have a deep impact on wide areas, and emergency rescue operations may benefit from social media information about the scope and extent of the disaster.
This work presents a text miningbased approach to collect and analyze social media data for early earthquake impact analysis.
arXiv Detail & Related papers (2022-12-12T13:51:07Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - A CNN-BiLSTM Model with Attention Mechanism for Earthquake Prediction [0.0]
This paper proposes a novel prediction method based on attention mechanism (AM), convolution neural network (CNN), and bi-directional long short-term memory (BiLSTM) models.
It can predict the number and maximum magnitude of earthquakes in each area of mainland China-based on the earthquake catalog of the region.
arXiv Detail & Related papers (2021-12-26T20:16:20Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Towards advancing the earthquake forecasting by machine learning of
satellite data [22.87513332935679]
We develop a novel machine learning method, namely Inverse Boosting Pruning Trees (IBPT), to issue short-term forecast based on the satellite data of 1,371 earthquakes of magnitude six or above.
Our proposed method outperforms all the six selected baselines and shows a strong capability in improving the likelihood of earthquake forecasting across different earthquake databases.
arXiv Detail & Related papers (2021-01-31T02:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.