A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19
Misinformation Topics
- URL: http://arxiv.org/abs/2205.09416v1
- Date: Thu, 19 May 2022 09:30:39 GMT
- Title: A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19
Misinformation Topics
- Authors: Harry Wang and Sharath Chandra Guntuku
- Abstract summary: We introduce a weakly-supervised iterative graph-based approach to detect keywords, topics, and themes related to misinformation.
Our approach can successfully detect specific topics from general misinformation-related seed words in a few seed texts.
- Score: 2.1471398891979647
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The COVID-19 pandemic has been accompanied by an `infodemic' -- of accurate
and inaccurate health information across social media. Detecting misinformation
amidst dynamically changing information landscape is challenging; identifying
relevant keywords and posts is arduous due to the large amount of human effort
required to inspect the content and sources of posts. We aim to reduce the
resource cost of this process by introducing a weakly-supervised iterative
graph-based approach to detect keywords, topics, and themes related to
misinformation, with a focus on COVID-19. Our approach can successfully detect
specific topics from general misinformation-related seed words in a few seed
texts. Our approach utilizes the BERT-based Word Graph Search (BWGS) algorithm
that builds on context-based neural network embeddings for retrieving
misinformation-related posts. We utilize Latent Dirichlet Allocation (LDA)
topic modeling for obtaining misinformation-related themes from the texts
returned by BWGS. Furthermore, we propose the BERT-based Multi-directional Word
Graph Search (BMDWGS) algorithm that utilizes greater starting context
information for misinformation extraction. In addition to a qualitative
analysis of our approach, our quantitative analyses show that BWGS and BMDWGS
are effective in extracting misinformation-related content compared to common
baselines in low data resource settings. Extracting such content is useful for
uncovering prevalent misconceptions and concerns and for facilitating precision
public health messaging campaigns to improve health behaviors.
Related papers
- ADLM -- stega: A Universal Adaptive Token Selection Algorithm for Improving Steganographic Text Quality via Information Entropy [1.413488665073795]
Steganographic systems enhance information security by embedding confidential information into public carriers.
Existing generative text steganography methods face challenges in handling the long-tail distribution of candidate word pools.
This paper proposes a quality control theory for steganographic text generation based on information entropy constraints.
arXiv Detail & Related papers (2024-10-28T08:25:31Z) - Efficient Knowledge Infusion via KG-LLM Alignment [10.735490041033113]
Knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion.
Existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor information compliance of LLMs with knowledge graphs.
We propose a three-stage KG-LLM alignment strategyto enhance the LLM's capability to utilize information from knowledge graphs.
arXiv Detail & Related papers (2024-06-06T04:55:55Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences [3.7405995078130148]
We propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy.
We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task.
We show that our method is able to identify core disinformation effectively.
arXiv Detail & Related papers (2020-10-21T08:53:36Z) - Visual Exploration and Knowledge Discovery from Biomedical Dark Data [0.0]
We employ a natural language processing based pipeline to discover knowledge out of the biomedical dark data.
We aim to proffer a potential solution to overcome the problem of analyzing overwhelming amounts of information.
arXiv Detail & Related papers (2020-09-28T04:27:05Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.