A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19
Misinformation Topics
- URL: http://arxiv.org/abs/2205.09416v1
- Date: Thu, 19 May 2022 09:30:39 GMT
- Title: A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19
Misinformation Topics
- Authors: Harry Wang and Sharath Chandra Guntuku
- Abstract summary: We introduce a weakly-supervised iterative graph-based approach to detect keywords, topics, and themes related to misinformation.
Our approach can successfully detect specific topics from general misinformation-related seed words in a few seed texts.
- Score: 2.1471398891979647
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The COVID-19 pandemic has been accompanied by an `infodemic' -- of accurate
and inaccurate health information across social media. Detecting misinformation
amidst dynamically changing information landscape is challenging; identifying
relevant keywords and posts is arduous due to the large amount of human effort
required to inspect the content and sources of posts. We aim to reduce the
resource cost of this process by introducing a weakly-supervised iterative
graph-based approach to detect keywords, topics, and themes related to
misinformation, with a focus on COVID-19. Our approach can successfully detect
specific topics from general misinformation-related seed words in a few seed
texts. Our approach utilizes the BERT-based Word Graph Search (BWGS) algorithm
that builds on context-based neural network embeddings for retrieving
misinformation-related posts. We utilize Latent Dirichlet Allocation (LDA)
topic modeling for obtaining misinformation-related themes from the texts
returned by BWGS. Furthermore, we propose the BERT-based Multi-directional Word
Graph Search (BMDWGS) algorithm that utilizes greater starting context
information for misinformation extraction. In addition to a qualitative
analysis of our approach, our quantitative analyses show that BWGS and BMDWGS
are effective in extracting misinformation-related content compared to common
baselines in low data resource settings. Extracting such content is useful for
uncovering prevalent misconceptions and concerns and for facilitating precision
public health messaging campaigns to improve health behaviors.
Related papers
- Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy [0.7673339435080445]
This paper introduces a solution driven by Retrieval-Augmented Generation (RAG) to enhance the retrieval of health-related documents grounded in scientific evidence.
In particular, we propose a three-stage model: in the first stage, the user's query is employed to retrieve topically relevant passages with associated references from a knowledge base constituted by scientific literature.
In the second stage, these passages, alongside the initial query, are processed by LLMs to generate a contextually relevant rich text (GenText)
In the last stage, the documents to be retrieved are evaluated and ranked both from the point of
arXiv Detail & Related papers (2025-02-07T05:19:13Z) - Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification [15.0627807767152]
We propose a graph-based online retrieval-augmented generation framework, namely GORAG, for dynamic few-shot text classification.
GORAG constructs and maintains a weighted graph by extracting side information across all target texts.
Empirical evaluations demonstrate that GORAG outperforms existing approaches by providing more comprehensive and precise contextual information.
arXiv Detail & Related papers (2025-01-06T08:43:31Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
We propose to explain the basis of tampered text detection with natural language via large multimodal models.
To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD.
Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o.
To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
arXiv Detail & Related papers (2024-12-19T13:10:03Z) - G-RAG: Knowledge Expansion in Material Science [0.0]
Graph RAG integrates graph databases to enhance the retrieval process.
We implement an agent-based parsing technique to achieve a more detailed representation of the documents.
arXiv Detail & Related papers (2024-11-21T21:22:58Z) - Epidemiology-informed Network for Robust Rumor Detection [59.89351792706995]
We propose a novel Epidemiology-informed Network (EIN) that integrates epidemiological knowledge to enhance performance.
To adapt epidemiology theory to rumor detection, it is expected that each users stance toward the source information will be annotated.
Our experimental results demonstrate that the proposed EIN not only outperforms state-of-the-art methods on real-world datasets but also exhibits enhanced robustness across varying tree depths.
arXiv Detail & Related papers (2024-11-20T00:43:32Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences [3.7405995078130148]
We propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy.
We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task.
We show that our method is able to identify core disinformation effectively.
arXiv Detail & Related papers (2020-10-21T08:53:36Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.