Claim Detection in Biomedical Twitter Posts
- URL: http://arxiv.org/abs/2104.11639v1
- Date: Fri, 23 Apr 2021 14:45:31 GMT
- Title: Claim Detection in Biomedical Twitter Posts
- Authors: Amelie W\"uhrl and Roman Klinger
- Abstract summary: False information on biomedical topics can be particularly dangerous.
We aim to fill this research gap and annotate a corpus of 1200 tweets for implicit and explicit biomedical claims.
We develop baseline models which detect tweets that contain a claim automatically.
- Score: 11.335643770130238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media contains unfiltered and unique information, which is potentially
of great value, but, in the case of misinformation, can also do great harm.
With regards to biomedical topics, false information can be particularly
dangerous. Methods of automatic fact-checking and fake news detection address
this problem, but have not been applied to the biomedical domain in social
media yet. We aim to fill this research gap and annotate a corpus of 1200
tweets for implicit and explicit biomedical claims (the latter also with span
annotations for the claim phrase). With this corpus, which we sample to be
related to COVID-19, measles, cystic fibrosis, and depression, we develop
baseline models which detect tweets that contain a claim automatically. Our
analyses reveal that biomedical tweets are densely populated with claims (45 %
in a corpus sampled to contain 1200 tweets focused on the domains mentioned
above). Baseline classification experiments with embedding-based classifiers
and BERT-based transfer learning demonstrate that the detection is challenging,
however, shows acceptable performance for the identification of explicit
expressions of claims. Implicit claim tweets are more challenging to detect.
Related papers
- Missci: Reconstructing Fallacies in Misrepresented Science [84.32990746227385]
Health-related misinformation on social networks can lead to poor decision-making and real-world dangers.
Missci is a novel argumentation theoretical model for fallacious reasoning.
We present Missci as a dataset to test the critical reasoning abilities of large language models.
arXiv Detail & Related papers (2024-06-05T12:11:10Z) - What Makes Medical Claims (Un)Verifiable? Analyzing Entity and Relation
Properties for Fact Verification [8.086400003948143]
The BEAR-Fact corpus is the first corpus for scientific fact verification annotated with subject-relation-object triplets, evidence documents, and fact-checking verdicts.
We show that it is possible to reliably estimate the success of evidence retrieval purely from the claim text.
The dataset is available at http://www.ims.uni-stuttgart.de/data/bioclaim.
arXiv Detail & Related papers (2024-02-02T12:27:58Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked
Claims [0.6927055673104934]
We publish a feature-rich dataset of 317k medical news articles/blogs and 3.5k fact-checked claims.
It also contains 573 manually and more than 51k automatically labelled mappings between claims and articles.
The dataset enables a number of additional tasks related to medical misinformation, such as misinformation characterisation studies or studies of misinformation diffusion between sources.
arXiv Detail & Related papers (2022-04-26T13:18:27Z) - CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets [10.536415845097661]
CoVERT is a fact-checked corpus of tweets with a focus on biomedicine and COVID-19-related (mis)information.
We employ a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online.
We use the retrieved evidence extracts as part of a fact-checking pipeline, finding that the real-world evidence is more useful than the knowledge indirectly available in pretrained language models.
arXiv Detail & Related papers (2022-04-26T09:05:03Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Domain-Specific Pretraining for Vertical Search: Case Study on
Biomedical Literature [67.4680600632232]
Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck.
We propose a general approach for vertical search based on domain-specific pretraining.
Our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search.
arXiv Detail & Related papers (2021-06-25T01:02:55Z) - TIB's Visual Analytics Group at MediaEval '20: Detecting Fake News on
Corona Virus and 5G Conspiracy [9.66022279280394]
Fake news on social media has become a hot topic of research as it negatively impacts the discourse of real news in the public.
The FakeNews task at MediaEval 2020 tackles this problem by creating a challenge to automatically detect tweets containing misinformation.
We present a simple approach that uses BERT embeddings and a shallow neural network for classifying tweets using only text.
arXiv Detail & Related papers (2021-01-10T11:52:17Z) - ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation
Detection [6.688963029270579]
ArCOV19-Rumors is an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020.
We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims.
Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic.
arXiv Detail & Related papers (2020-10-17T11:21:40Z) - Misinformation Has High Perplexity [55.47422012881148]
We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
arXiv Detail & Related papers (2020-06-08T15:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.