Related papers: Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec

Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec

URL: http://arxiv.org/abs/2407.07914v1
Date: Fri, 05 Jul 2024 10:33:15 GMT
Title: Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec
Authors: Rishabh Upadhyay, Gabriella Pasi, Marco Viviani,
Abstract summary: We focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages. This work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.
Score: 3.299010876315217
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, we have witnessed the proliferation of large amounts of online content generated directly by users with virtually no form of external control, leading to the possible spread of misinformation. The search for effective solutions to this problem is still ongoing, and covers different areas of application, from opinion spam to fake news detection. A more recently investigated scenario, despite the serious risks that incurring disinformation could entail, is that of the online dissemination of health information. Early approaches in this area focused primarily on user-based studies applied to Web page content. More recently, automated approaches have been developed for both Web pages and social media content, particularly with the advent of the COVID-19 pandemic. These approaches are primarily based on handcrafted features extracted from online content in association with Machine Learning. In this scenario, we focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages. Therefore, this work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.

Related papers

SoK: Advances and Open Problems in Web Tracking [71.54586748169943]
Web tracking is a pervasive and opaque practice that enables personalized advertising, and conversion tracking.<n>Web tracking is undergoing a once-in-a-generation transformation driven by shifts in the advertising industry, the adoption of anti-tracking countermeasures by browsers, and the growing enforcement of emerging privacy regulations.<n>This Systematization of Knowledge (SoK) aims to consolidate and synthesize this wide-ranging research, offering a comprehensive overview of the technical mechanisms, countermeasures, and regulations that shape the modern and rapidly evolving web tracking landscape.
arXiv Detail & Related papers (2025-06-16T23:30:54Z)
A Platform for Investigating Public Health Content with Efficient Concern Classification [9.523478337036588]
We present ConcernScope, a platform that uses a teacher-student framework for knowledge transfer between large language models and light-weight classifiers.<n>ConcernScope is built on top of a taxonomy of public health concerns and allows uploading massive files directly, automatically scraping specific URLs, and direct text editing.<n>We demonstrate several applications of this platform: guided data exploration to find useful examples of common concerns found in online community datasets, identification of trends in concerns through an example time series analysis of 186,000 samples, and finding trends in topic frequency before and after significant events.
arXiv Detail & Related papers (2025-06-02T04:36:13Z)
Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks. We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance. Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z)
Web Privacy based on Contextual Integrity: Measuring the Collapse of Online Contexts [0.0]
We operationalize the theory of Privacy as Contextual Integrity and measure persistent user identification within and between Web contexts. We crawl the top-700 popular websites across the contexts of health, finance, news & media, LGBTQ, eCommerce, adult, and education websites, for 27 days. This is a first modest step in measuring Web privacy as Contextual Integrity, opening new avenues for contextual Web privacy research.
arXiv Detail & Related papers (2024-12-19T23:30:29Z)
Towards Scalable Topic Detection on Web via Simulating Levy Walks Nature of Topics in Similarity Space [55.97416108140739]
We present a novel, yet very powerful Explore-Exploit (EE) approach to group topics by simulating Levy walks nature in the similarity space. Experiments on two public data sets demonstrate that our approach is not only comparable to the state-of-the-art methods in terms of effectiveness but also significantly outperforms the state-of-the-art methods in terms of efficiency.
arXiv Detail & Related papers (2024-07-26T07:19:46Z)
Finding Fake News Websites in the Wild [0.0860395700487494]
We propose a novel methodology for identifying websites responsible for creating and disseminating misinformation content. We validate our approach on Twitter by examining various execution modes and contexts.
arXiv Detail & Related papers (2024-07-09T18:00:12Z)
EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems [103.91826112815384]
citation-based QA systems are suffering from two shortcomings. They usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system. We propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system.
arXiv Detail & Related papers (2024-06-14T19:40:38Z)
A Responsive Framework for Research Portals Data using Semantic Web Technology [0.6798775532273751]
The research aims to address this issue by designing a framework for the semantic organization of research portal data. The framework focuses on the extraction of information from two specific research portals, namely Microsoft Academic and IEEE Xplore.
arXiv Detail & Related papers (2023-06-20T16:12:33Z)
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification. Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z)
ClueWeb22: 10 Billion Web Documents with Rich Information [28.68403988636645]
ClueWeb22 provides 10 billion web pages affiliated with rich information. Its design was influenced by the need for a high quality, large scale web corpus to support academic and industry research.
arXiv Detail & Related papers (2022-11-29T00:49:40Z)
CoVA: Context-aware Visual Attention for Webpage Information Extraction [65.11609398029783]
We propose to reformulate WIE as a context-aware Webpage Object Detection task. We develop a Context-aware Visual Attention-based (CoVA) detection pipeline which combines appearance features with syntactical structure from the DOM tree. We show that the proposed CoVA approach is a new challenging baseline which improves upon prior state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T00:21:46Z)
A Crawler Architecture for Harvesting the Clear, Social, and Dark Web for IoT-Related Cyber-Threat Intelligence [1.1661238776379117]
The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information. We present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web.
arXiv Detail & Related papers (2021-09-14T19:26:08Z)
Threat of Adversarial Attacks on Deep Learning in Computer Vision: Survey II [86.51135909513047]
Deep Learning is vulnerable to adversarial attacks that can manipulate its predictions. This article reviews the contributions made by the computer vision community in adversarial attacks on deep learning. It provides definitions of technical terminologies for non-experts in this domain.
arXiv Detail & Related papers (2021-08-01T08:54:47Z)
Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering [25.385862319865335]
ASCENT is a fully automated methodology for extracting and consolidating commonsense assertions from web contents. In this demo, we present a web portal that allows users to understand its construction process, explore its content, and observe its impact on the use case of question answering.
arXiv Detail & Related papers (2021-05-28T08:17:33Z)
Bringing Cognitive Augmentation to Web Browsing Accessibility [69.62988485669146]
We explore opportunities brought by cognitive augmentation to provide a more natural and accessible web browsing experience. We develop a conceptual framework for supporting BVIP conversational web browsing needs. We describe our early work and prototype that leverages that consider structural and content features only.
arXiv Detail & Related papers (2020-12-07T14:40:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.