Capturing Pertinent Symbolic Features for Enhanced Content-Based
Misinformation Detection
- URL: http://arxiv.org/abs/2401.16285v1
- Date: Mon, 29 Jan 2024 16:42:34 GMT
- Title: Capturing Pertinent Symbolic Features for Enhanced Content-Based
Misinformation Detection
- Authors: Flavio Merenda and Jos\'e Manuel G\'omez-P\'erez
- Abstract summary: The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability.
This paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are.
We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preventing the spread of misinformation is challenging. The detection of
misleading content presents a significant hurdle due to its extreme linguistic
and domain variability. Content-based models have managed to identify deceptive
language by learning representations from textual data such as social media
posts and web articles. However, aggregating representative samples of this
heterogeneous phenomenon and implementing effective real-world applications is
still elusive. Based on analytical work on the language of misinformation, this
paper analyzes the linguistic attributes that characterize this phenomenon and
how representative of such features some of the most popular misinformation
datasets are. We demonstrate that the appropriate use of pertinent symbolic
knowledge in combination with neural language models is helpful in detecting
misleading content. Our results achieve state-of-the-art performance in
misinformation datasets across the board, showing that our approach offers a
valid and robust alternative to multi-task transfer learning without requiring
any additional training data. Furthermore, our results show evidence that
structured knowledge can provide the extra boost required to address a complex
and unpredictable real-world problem like misinformation detection, not only in
terms of accuracy but also time efficiency and resource utilization.
Related papers
- Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains [6.920249042435973]
Large Language Models (LLMs) are powerful tools for text generation, translation, and summarization.
LLMs often suffer from hallucinations-instances where they fail to maintain the fidelity and coherence of contextual information.
We propose a novel decoding strategy that leverages absorbing Markov chains to quantify the significance of contextual information.
arXiv Detail & Related papers (2024-10-27T04:51:18Z) - Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets [0.5999777817331317]
This paper critically evaluates the prevalent assumption that machine learning models genuinely learn to identify paralinguistic traits.
By examining the lexical overlap in these datasets and testing the performance of machine learning models, we expose significant text-dependency in trait-labeling.
arXiv Detail & Related papers (2024-03-12T15:54:32Z) - Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Interpretable Fake News Detection with Topic and Deep Variational Models [2.15242029196761]
We focus on fake news detection using interpretable features and methods.
We have developed a deep probabilistic model that integrates a dense representation of textual news.
Our model achieves comparable performance to state-of-the-art competing models.
arXiv Detail & Related papers (2022-09-04T05:31:00Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - On the Effects of Knowledge-Augmented Data in Word Embeddings [0.6749750044497732]
We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings.
We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
arXiv Detail & Related papers (2020-10-05T02:14:13Z) - Natural language technology and query expansion: issues,
state-of-the-art and perspectives [0.0]
Linguistic characteristics that cause ambiguity and misinterpretation of queries as well as additional factors affect the users ability to accurately represent their information needs.
We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition.
For each of the modules we review the state-of-the-art solutions in the literature and categorized under the light of the techniques used.
arXiv Detail & Related papers (2020-04-23T11:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.