Related papers: Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains

Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains

URL: http://arxiv.org/abs/2406.11423v2
Date: Tue, 17 Sep 2024 16:20:53 GMT
Title: Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains
Authors: Evan M. Williams, Peter Carragher, Kathleen M. Carley,
Abstract summary: We develop a website credibility classification and discovery system that integrates webgraph and social media contexts. We introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines. We release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms.
Score: 3.659498819753633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Proactive content moderation requires platforms to rapidly and continuously evaluate the credibility of websites. Leveraging the direct and indirect paths users follow to unreliable websites, we develop a website credibility classification and discovery system that integrates both webgraph and large-scale social media contexts. We additionally introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines, and provide the first exploration of their usage on social media. Our graph neural networks that combine webgraph and social media contexts generate to state-of-the-art results in website credibility classification and significantly improves the top-k identification of unreliable domains. Additionally, we release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms.

Related papers

An Illusion of Progress? Assessing the Current State of Web Agents [49.76769323750729]
We conduct a comprehensive and rigorous assessment of the current state of web agents. Results depict a very different picture of the competency of current agents, suggesting over-optimism in previously reported results. We introduce Online-Mind2Web, an online evaluation benchmark consisting of 300 diverse and realistic tasks spanning 136 websites.
arXiv Detail & Related papers (2025-04-02T05:51:29Z)
Detection and Discovery of Misinformation Sources using Attributed Webgraphs [3.659498819753633]
We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs. We also introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
arXiv Detail & Related papers (2024-01-04T17:47:36Z)
Exploring Embeddings for Measuring Text Relatedness: Unveiling Sentiments and Relationships in Online Comments [1.7230140898679147]
This paper investigates sentiment and semantic relationships among comments across various social media platforms. It uses word embeddings to analyze components in sentences and documents. Our analysis will enable a deeper understanding of the interconnectedness of online comments and will investigate the notion of the internet functioning as a large interconnected brain.
arXiv Detail & Related papers (2023-09-15T04:57:23Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Nothing Stands Alone: Relational Fake News Detection with Hypergraph Neural Networks [49.29141811578359]
We propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism. Our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
arXiv Detail & Related papers (2022-12-24T00:19:32Z)
A Review of Web Infodemic Analysis and Detection Trends across Multi-modalities using Deep Neural Networks [3.42658286826597]
Fake news detection is one of the most analyzed and prominent areas of research. Facebook, Reddit, WhatsApp, YouTube, and other social applications are gradually gaining attention in this emerging field. This review primarily deals with multi-modal fake news detection techniques that include images, videos, and their combinations with text.
arXiv Detail & Related papers (2021-11-23T16:02:28Z)
FR-Detect: A Multi-Modal Framework for Early Fake News Detection on Social Media Using Publishers Features [0.0]
Despite the advantages of these media in the news field, the lack of any control and verification mechanism has led to the spread of fake news. We suggest a high accurate multi-modal framework, namely FR-Detect, using user-related and content-related features with early detection capability. Experiments have shown that the publishers' features can improve the performance of content-based models by up to 13% and 29% in accuracy and F1-score.
arXiv Detail & Related papers (2021-09-10T12:39:00Z)
Named Entity Recognition for Social Media Texts with Semantic Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts. We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z)
Political audience diversity and news reliability in algorithmic ranking [54.23273310155137]
We propose using the political diversity of a website's audience as a quality signal. Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards.
arXiv Detail & Related papers (2020-07-16T02:13:55Z)
A Framework for Pre-processing of Social Media Feeds based on Integrated Local Knowledge Base [1.5749416770494706]
This paper proposes an improved framework for pre-processing of social media feeds for better performance. The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets.
arXiv Detail & Related papers (2020-06-29T07:56:22Z)
Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News [67.53424807783414]
Social media has greatly enabled people to participate in online activities at an unprecedented rate. This unrestricted access also exacerbates the spread of misinformation and fake news online which might cause confusion and chaos unless being detected early for its mitigation. We jointly leverage the limited amount of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances. Experiments on realworld datasets demonstrate that the proposed framework outperforms state-of-the-art baselines for early detection of fake news without using any user engagements at prediction time.
arXiv Detail & Related papers (2020-04-03T18:26:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.