Related papers: A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19 Misinformation Topics

A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19 Misinformation Topics

URL: http://arxiv.org/abs/2205.09416v1
Date: Thu, 19 May 2022 09:30:39 GMT
Title: A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19 Misinformation Topics
Authors: Harry Wang and Sharath Chandra Guntuku
Abstract summary: We introduce a weakly-supervised iterative graph-based approach to detect keywords, topics, and themes related to misinformation. Our approach can successfully detect specific topics from general misinformation-related seed words in a few seed texts.
Score: 2.1471398891979647
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The COVID-19 pandemic has been accompanied by an `infodemic' -- of accurate and inaccurate health information across social media. Detecting misinformation amidst dynamically changing information landscape is challenging; identifying relevant keywords and posts is arduous due to the large amount of human effort required to inspect the content and sources of posts. We aim to reduce the resource cost of this process by introducing a weakly-supervised iterative graph-based approach to detect keywords, topics, and themes related to misinformation, with a focus on COVID-19. Our approach can successfully detect specific topics from general misinformation-related seed words in a few seed texts. Our approach utilizes the BERT-based Word Graph Search (BWGS) algorithm that builds on context-based neural network embeddings for retrieving misinformation-related posts. We utilize Latent Dirichlet Allocation (LDA) topic modeling for obtaining misinformation-related themes from the texts returned by BWGS. Furthermore, we propose the BERT-based Multi-directional Word Graph Search (BMDWGS) algorithm that utilizes greater starting context information for misinformation extraction. In addition to a qualitative analysis of our approach, our quantitative analyses show that BWGS and BMDWGS are effective in extracting misinformation-related content compared to common baselines in low data resource settings. Extracting such content is useful for uncovering prevalent misconceptions and concerns and for facilitating precision public health messaging campaigns to improve health behaviors.

Related papers

DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs [55.13817504780764]
Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features, often rich in textual data, and graph structural information.<n>Graph-Enhanced LLMs emerge as a promising graph learning approach that converts graph information into prompts.<n>We propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node.
arXiv Detail & Related papers (2025-07-29T10:10:47Z)
A Tripartite Perspective on GraphRAG [0.0]
Large Language Models (LLMs) have shown remarkable capabilities across various domains, yet they struggle with knowledge-intensive tasks. Key limitations include their tendency to hallucinate, lack of source traceability (provenance), and challenges in timely knowledge updates. We propose a novel approach that combines LLMs with a tripartite knowledge graph representation.
arXiv Detail & Related papers (2025-04-28T10:43:35Z)
Contextual Embedding-based Clustering to Identify Topics for Healthcare Service Improvement [3.9726806016869936]
This study explores unsupervised methods to extract meaningful topics from 439 survey responses collected from a healthcare system in Wisconsin, USA. A keyword-based filtering approach was applied to isolate complaint-related feedback using a domain-specific lexicon. To improve coherence and interpretability where data are scarce and consist of short-texts, we propose kBERT.
arXiv Detail & Related papers (2025-04-18T20:38:24Z)
Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy [0.7673339435080445]
This paper introduces a solution driven by Retrieval-Augmented Generation (RAG) to enhance the retrieval of health-related documents grounded in scientific evidence. In particular, we propose a three-stage model: in the first stage, the user's query is employed to retrieve topically relevant passages with associated references from a knowledge base constituted by scientific literature. In the second stage, these passages, alongside the initial query, are processed by LLMs to generate a contextually relevant rich text (GenText) In the last stage, the documents to be retrieved are evaluated and ranked both from the point of
arXiv Detail & Related papers (2025-02-07T05:19:13Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
We propose to explain the basis of tampered text detection with natural language via large multimodal models. To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD. Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o. To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
arXiv Detail & Related papers (2024-12-19T13:10:03Z)
G-RAG: Knowledge Expansion in Material Science [0.0]
Graph RAG integrates graph databases to enhance the retrieval process. We implement an agent-based parsing technique to achieve a more detailed representation of the documents.
arXiv Detail & Related papers (2024-11-21T21:22:58Z)
Epidemiology-informed Network for Robust Rumor Detection [59.89351792706995]
We propose a novel Epidemiology-informed Network (EIN) that integrates epidemiological knowledge to enhance performance. To adapt epidemiology theory to rumor detection, it is expected that each users stance toward the source information will be annotated. Our experimental results demonstrate that the proposed EIN not only outperforms state-of-the-art methods on real-world datasets but also exhibits enhanced robustness across varying tree depths.
arXiv Detail & Related papers (2024-11-20T00:43:32Z)
Efficient Knowledge Infusion via KG-LLM Alignment [10.735490041033113]
Knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion. Existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor information compliance of LLMs with knowledge graphs. We propose a three-stage KG-LLM alignment strategyto enhance the LLM's capability to utilize information from knowledge graphs.
arXiv Detail & Related papers (2024-06-06T04:55:55Z)
Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented. LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones. We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z)
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining. We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data. Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z)
Contextual information integration for stance detection via cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target. Most existing stance detection models are limited because they do not consider relevant contextual information. We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z)
Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks. Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z)
Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text. Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences [3.7405995078130148]
We propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. We show that our method is able to identify core disinformation effectively.
arXiv Detail & Related papers (2020-10-21T08:53:36Z)
Visual Exploration and Knowledge Discovery from Biomedical Dark Data [0.0]
We employ a natural language processing based pipeline to discover knowledge out of the biomedical dark data. We aim to proffer a potential solution to overcome the problem of analyzing overwhelming amounts of information.
arXiv Detail & Related papers (2020-09-28T04:27:05Z)
ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge. We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text. We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.