Comparing Knowledge Sources for Open-Domain Scientific Claim
Verification
- URL: http://arxiv.org/abs/2402.02844v1
- Date: Mon, 5 Feb 2024 09:57:15 GMT
- Title: Comparing Knowledge Sources for Open-Domain Scientific Claim
Verification
- Authors: Juraj Vladika, Florian Matthes
- Abstract summary: We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns.
We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.
- Score: 6.726255259929497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing rate at which scientific knowledge is discovered and health
claims shared online has highlighted the importance of developing efficient
fact-checking systems for scientific claims. The usual setting for this task in
the literature assumes that the documents containing the evidence for claims
are already provided and annotated or contained in a limited corpus. This
renders the systems unrealistic for real-world settings where knowledge sources
with potentially millions of documents need to be queried to find relevant
evidence. In this paper, we perform an array of experiments to test the
performance of open-domain claim verification systems. We test the final
verdict prediction of systems on four datasets of biomedical and health claims
in different settings. While keeping the pipeline's evidence selection and
verdict prediction parts constant, document retrieval is performed over three
common knowledge sources (PubMed, Wikipedia, Google) and using two different
information retrieval techniques. We show that PubMed works better with
specialized biomedical claims, while Wikipedia is more suited for everyday
health concerns. Likewise, BM25 excels in retrieval precision, while semantic
search in recall of relevant evidence. We discuss the results, outline frequent
retrieval patterns and challenges, and provide promising future directions.
Related papers
- Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence [0.12277343096128711]
We study three core tasks: identifying medical claims, extracting medical vocabulary from these claims, and retrieving evidence relevant to those identified medical claims.
We propose a novel system that can generate synthetic medical claims to aid each of these core tasks.
arXiv Detail & Related papers (2024-05-18T07:50:43Z) - Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval [5.69361786082969]
Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases.
By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets.
Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%.
arXiv Detail & Related papers (2024-04-12T09:56:12Z) - What Makes Medical Claims (Un)Verifiable? Analyzing Entity and Relation
Properties for Fact Verification [8.086400003948143]
The BEAR-Fact corpus is the first corpus for scientific fact verification annotated with subject-relation-object triplets, evidence documents, and fact-checking verdicts.
We show that it is possible to reliably estimate the success of evidence retrieval purely from the claim text.
The dataset is available at http://www.ims.uni-stuttgart.de/data/bioclaim.
arXiv Detail & Related papers (2024-02-02T12:27:58Z) - HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking [5.065947993017158]
HealthFC is a dataset of 750 health-related claims in German and English labeled for veracity by medical experts.
We provide an analysis of the dataset, highlighting its characteristics and challenges.
We show that the dataset is a challenging test bed with a high potential for future use.
arXiv Detail & Related papers (2023-09-15T16:05:48Z) - Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking.
Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine.
We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z) - Complex Claim Verification with Evidence Retrieved in the Wild [73.19998942259073]
We present the first fully automated pipeline to check real-world claims by retrieving raw evidence from the web.
Our pipeline includes five components: claim decomposition, raw document retrieval, fine-grained evidence retrieval, claim-focused summarization, and veracity judgment.
arXiv Detail & Related papers (2023-05-19T17:49:19Z) - SciFact-Open: Towards open-domain scientific claim verification [61.288725621156864]
We present SciFact-Open, a new test collection designed to evaluate the performance of scientific claim verification systems.
We collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models.
We find that systems developed on smaller corpora struggle to generalize to SciFact-Open, exhibiting performance drops of at least 15 F1.
arXiv Detail & Related papers (2022-10-25T05:45:00Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - On the Combined Use of Extrinsic Semantic Resources for Medical
Information Search [0.0]
We develop a framework to highlight and expand head medical concepts in verbose medical queries.
We also build semantically enhanced inverted index documents.
To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF 2014 dataset.
arXiv Detail & Related papers (2020-05-17T14:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.