Use of social media and Natural Language Processing (NLP) in natural
hazard research
- URL: http://arxiv.org/abs/2304.08341v1
- Date: Mon, 17 Apr 2023 15:03:05 GMT
- Title: Use of social media and Natural Language Processing (NLP) in natural
hazard research
- Authors: Jos\'e Augusto Proen\c{c}a Maia Devienne
- Abstract summary: In the works of Sasaki et al. (2010) and Earle et al. (2011) the authors explored the real-time interaction on Twitter for detecting natural hazards.
An inherent challenge for such an application is the natural language processing (NLP), which basically consists in converting the words in number.
In this report we implement a NLP machine learning process with advanced classification and classification applications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Twitter is a microblogging service for sending short, public text messages
(tweets) that has recently received more attention in scientific comunity. In
the works of Sasaki et al. (2010) and Earle et al., (2011) the authors explored
the real-time interaction on Twitter for detecting natural hazards (e.g.,
earthquakes, typhoons) baed on users' tweets. An inherent challenge for such an
application is the natural language processing (NLP), which basically consists
in converting the words in number (vectors and tensors) in order to
(mathematically/ computationally) make predictions and classifications.
Recently advanced computational tools have been made available for dealing with
text computationally. In this report we implement a NLP machine learning with
TensorFlow, an end-to-end open source plataform for machine learning
applications, to process and classify evenct based on files containing only
text.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data [31.19059013571499]
Twitter-Demographer is a flow-based tool to enrich Twitter data with additional information about tweets and users.
We discuss our design choices, inspired by the flow-based programming paradigm, to use black-box components that can easily be chained together and extended.
arXiv Detail & Related papers (2022-01-26T14:59:17Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Efficacy of BERT embeddings on predicting disaster from Twitter data [0.548253258922555]
Rescue agencies monitor social media to identify disasters and reduce the risk of lives.
It is impossible for humans to manually check the mass amount of data and identify disasters in real-time.
Advanced contextual embedding method (BERT) constructs different vectors for the same word in different contexts.
BERT embeddings have the best results in disaster prediction task than the traditional word embeddings.
arXiv Detail & Related papers (2021-08-08T17:44:29Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets [6.18447297698017]
We propose a transfer learning based model that will be able to detect if an Arabic sentence is written by humans or automatically generated by bots.
Our new transfer-learning model has obtained an accuracy up to 98%.
To the best of our knowledge, this work is the first study where ARABERT and GPT2 were combined to detect and classify the Arabic auto-generated texts.
arXiv Detail & Related papers (2021-01-22T21:50:38Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - Integrating Crowdsourcing and Active Learning for Classification of
Work-Life Events from Tweets [9.137917522951277]
Social media data are unstructured and must undergo complex manipulation for research use.
We devised a crowdsourcing pipeline combined with active learning strategies.
Results show that crowdsourcing is useful to create high-quality annotations and active learning helps in reducing the number of required tweets.
arXiv Detail & Related papers (2020-03-26T20:19:33Z) - Deep Learning for Hindi Text Classification: A Comparison [6.8629257716723]
The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus.
In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention.
The paper also serves as a tutorial for popular text classification techniques.
arXiv Detail & Related papers (2020-01-19T09:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.