Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec
- URL: http://arxiv.org/abs/2407.07914v1
- Date: Fri, 05 Jul 2024 10:33:15 GMT
- Title: Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec
- Authors: Rishabh Upadhyay, Gabriella Pasi, Marco Viviani,
- Abstract summary: We focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages.
This work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.
- Score: 3.299010876315217
- License:
- Abstract: In recent years, we have witnessed the proliferation of large amounts of online content generated directly by users with virtually no form of external control, leading to the possible spread of misinformation. The search for effective solutions to this problem is still ongoing, and covers different areas of application, from opinion spam to fake news detection. A more recently investigated scenario, despite the serious risks that incurring disinformation could entail, is that of the online dissemination of health information. Early approaches in this area focused primarily on user-based studies applied to Web page content. More recently, automated approaches have been developed for both Web pages and social media content, particularly with the advent of the COVID-19 pandemic. These approaches are primarily based on handcrafted features extracted from online content in association with Machine Learning. In this scenario, we focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages. Therefore, this work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.
Related papers
- Towards Scalable Topic Detection on Web via Simulating Levy Walks Nature of Topics in Similarity Space [55.97416108140739]
We present a novel, yet very powerful Explore-Exploit (EE) approach to group topics by simulating Levy walks nature in the similarity space.
Experiments on two public data sets demonstrate that our approach is not only comparable to the state-of-the-art methods in terms of effectiveness but also significantly outperforms the state-of-the-art methods in terms of efficiency.
arXiv Detail & Related papers (2024-07-26T07:19:46Z) - Finding Fake News Websites in the Wild [0.0860395700487494]
We propose a novel methodology for identifying websites responsible for creating and disseminating misinformation content.
We validate our approach on Twitter by examining various execution modes and contexts.
arXiv Detail & Related papers (2024-07-09T18:00:12Z) - EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems [103.91826112815384]
citation-based QA systems are suffering from two shortcomings.
They usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system.
We propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system.
arXiv Detail & Related papers (2024-06-14T19:40:38Z) - A Responsive Framework for Research Portals Data using Semantic Web
Technology [0.6798775532273751]
The research aims to address this issue by designing a framework for the semantic organization of research portal data.
The framework focuses on the extraction of information from two specific research portals, namely Microsoft Academic and IEEE Xplore.
arXiv Detail & Related papers (2023-06-20T16:12:33Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - ClueWeb22: 10 Billion Web Documents with Rich Information [28.68403988636645]
ClueWeb22 provides 10 billion web pages affiliated with rich information.
Its design was influenced by the need for a high quality, large scale web corpus to support academic and industry research.
arXiv Detail & Related papers (2022-11-29T00:49:40Z) - CoVA: Context-aware Visual Attention for Webpage Information Extraction [65.11609398029783]
We propose to reformulate WIE as a context-aware Webpage Object Detection task.
We develop a Context-aware Visual Attention-based (CoVA) detection pipeline which combines appearance features with syntactical structure from the DOM tree.
We show that the proposed CoVA approach is a new challenging baseline which improves upon prior state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T00:21:46Z) - A Crawler Architecture for Harvesting the Clear, Social, and Dark Web
for IoT-Related Cyber-Threat Intelligence [1.1661238776379117]
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information.
We present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web.
arXiv Detail & Related papers (2021-09-14T19:26:08Z) - Threat of Adversarial Attacks on Deep Learning in Computer Vision:
Survey II [86.51135909513047]
Deep Learning is vulnerable to adversarial attacks that can manipulate its predictions.
This article reviews the contributions made by the computer vision community in adversarial attacks on deep learning.
It provides definitions of technical terminologies for non-experts in this domain.
arXiv Detail & Related papers (2021-08-01T08:54:47Z) - Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage
in Question Answering [25.385862319865335]
ASCENT is a fully automated methodology for extracting and consolidating commonsense assertions from web contents.
In this demo, we present a web portal that allows users to understand its construction process, explore its content, and observe its impact on the use case of question answering.
arXiv Detail & Related papers (2021-05-28T08:17:33Z) - Bringing Cognitive Augmentation to Web Browsing Accessibility [69.62988485669146]
We explore opportunities brought by cognitive augmentation to provide a more natural and accessible web browsing experience.
We develop a conceptual framework for supporting BVIP conversational web browsing needs.
We describe our early work and prototype that leverages that consider structural and content features only.
arXiv Detail & Related papers (2020-12-07T14:40:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.