ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation
Networks
- URL: http://arxiv.org/abs/2004.05861v4
- Date: Sat, 13 Mar 2021 23:14:06 GMT
- Title: ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation
Networks
- Authors: Fatima Haouari, Maram Hasanain, Reem Suwaileh, Tamer Elsayed
- Abstract summary: ArCOV-19 is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic.
It includes about 2.7M tweets alongside the propagation networks of the most-popular subset of them.
- Score: 6.688963029270579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset that
spans one year, covering the period from 27th of January 2020 till 31st of
January 2021. ArCOV-19 is the first publicly-available Arabic Twitter dataset
covering COVID-19 pandemic that includes about 2.7M tweets alongside the
propagation networks of the most-popular subset of them (i.e., most-retweeted
and -liked). The propagation networks include both retweets and conversational
threads (i.e., threads of replies). ArCOV-19 is designed to enable research
under several domains including natural language processing, information
retrieval, and social computing. Preliminary analysis shows that ArCOV-19
captures rising discussions associated with the first reported cases of the
disease as they appeared in the Arab world. In addition to the source tweets
and propagation networks, we also release the search queries and
language-independent crawler used to collect the tweets to encourage the
curation of similar datasets.
Related papers
- "COVID-19 was a FIFA conspiracy #curropt": An Investigation into the
Viral Spread of COVID-19 Misinformation [60.268682953952506]
We estimate the extent to which misinformation has influenced the course of the COVID-19 pandemic using natural language processing models.
We provide a strategy to combat social media posts that are likely to cause widespread harm.
arXiv Detail & Related papers (2022-06-12T19:41:01Z) - Twitter Dataset on the Russo-Ukrainian War [68.713984286035]
We have initiated an ongoing dataset acquisition from Twitter API.
The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users.
We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
arXiv Detail & Related papers (2022-04-07T12:33:06Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - Extracting Feelings of People Regarding COVID-19 by Social Network
Mining [0.0]
dataset of COVID-related tweets in English language is collected.
More than two million tweets from March 23 to June 23 of 2020 are analyzed.
arXiv Detail & Related papers (2021-10-12T16:45:33Z) - Evaluating the COVID-19 Identification ResNet (CIdeR) on the INTERSPEECH
COVID-19 from Audio Challenges [59.78485839636553]
CIdeR is an end-to-end deep learning neural network originally designed to classify whether an individual is COVID-positive or COVID-negative.
We demonstrate the potential of CIdeR at binary COVID-19 diagnosis from both the COVID-19 Cough and Speech Sub-Challenges of INTERSPEECH 2021, ComParE and DiCOVA.
arXiv Detail & Related papers (2021-07-30T10:59:08Z) - CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information [0.0]
CML-COVID is a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals.
These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19.
arXiv Detail & Related papers (2021-01-28T18:59:10Z) - ArCorona: Analyzing Arabic Tweets in the Early Days of Coronavirus
(COVID-19) Pandemic [3.057212947792573]
We present the largest manually annotated dataset of Arabic tweets related to COVID-19.
We describe annotation guidelines, analyze our dataset and build effective machine learning and transformer based models for classification.
arXiv Detail & Related papers (2020-12-02T19:05:25Z) - ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation
Detection [6.688963029270579]
ArCOV19-Rumors is an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020.
We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims.
Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic.
arXiv Detail & Related papers (2020-10-17T11:21:40Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z) - COVID-19 on Social Media: Analyzing Misinformation in Twitter
Conversations [22.43295864610142]
We collected streaming data related to COVID-19 using the Twitter API, starting March 1, 2020.
We identified unreliable and misleading contents based on fact-checking sources.
We examined the narratives promoted in misinformation tweets, along with the distribution of engagements with these tweets.
arXiv Detail & Related papers (2020-03-26T09:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.