LOKI: Proactively Discovering Online Scam Websites by Mining Toxic Search Queries
- URL: http://arxiv.org/abs/2509.12181v1
- Date: Mon, 15 Sep 2025 17:44:08 GMT
- Title: LOKI: Proactively Discovering Online Scam Websites by Mining Toxic Search Queries
- Authors: Pujan Paudel, Gianluca Stringhini,
- Abstract summary: We present LOKI, a system designed to identify search engine queries likely to return a high fraction of fraudulent websites.<n>We rigorously validate LOKI across 10 major scam categories and demonstrate a 20.58 times improvement in discovery.<n>We show that LOKI generalizes to previously-unseen scam categories, highlighting its utility in surfacing emerging threats.
- Score: 11.479111172495408
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online e-commerce scams, ranging from shopping scams to pet scams, globally cause millions of dollars in financial damage every year. In response, the security community has developed highly accurate detection systems able to determine if a website is fraudulent. However, finding candidate scam websites that can be passed as input to these downstream detection systems is challenging: relying on user reports is inherently reactive and slow, and proactive systems issuing search engine queries to return candidate websites suffer from low coverage and do not generalize to new scam types. In this paper, we present LOKI, a system designed to identify search engine queries likely to return a high fraction of fraudulent websites. LOKI implements a keyword scoring model grounded in Learning Under Privileged Information (LUPI) and feature distillation from Search Engine Result Pages (SERPs). We rigorously validate LOKI across 10 major scam categories and demonstrate a 20.58 times improvement in discovery over both heuristic and data-driven baselines across all categories. Leveraging a small seed set of only 1,663 known scam sites, we use the keywords identified by our method to discover 52,493 previously unreported scams in the wild. Finally, we show that LOKI generalizes to previously-unseen scam categories, highlighting its utility in surfacing emerging threats.
Related papers
- SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning [52.29460857893198]
Existing fraud detection methods rely on transcribed text, suffering from ASR errors and missing crucial acoustic cues like vocal tone and environmental context.<n>We propose SAFE-QAQ, an end-to-end comprehensive framework for audio-based slow-thinking fraud detection.<n>Our framework introduces a dynamic risk assessment framework during live calls, enabling early detection and prevention of fraud.
arXiv Detail & Related papers (2026-01-04T06:09:07Z) - SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents [63.70653857721785]
We conduct two in-the-wild experiments to demonstrate the prevalence of low-quality search results and their potential to misguide agent behaviors.<n>To counter this threat, we introduce an automated red-teaming framework that is systematic, scalable, and cost-efficient.
arXiv Detail & Related papers (2025-09-28T07:05:17Z) - Send to which account? Evaluation of an LLM-based Scambaiting System [0.0]
This paper presents the first large-scale, real-world evaluation of a scambaiting system powered by large language models (LLMs)<n>Over a five-month deployment, the system initiated over 2,600 engagements with actual scammers, resulting in a dataset of more than 18,700 messages.<n>It achieved an Information Disclosure Rate (IDR) of approximately 32%, successfully extracting sensitive financial information such as mule accounts.
arXiv Detail & Related papers (2025-09-10T11:08:52Z) - PsyScam: A Benchmark for Psychological Techniques in Real-World Scams [38.57446009573742]
PsyScam is a benchmark designed to systematically capture the psychological techniques employed in real-world scam reports.<n>We show that PsyScam presents significant challenges to existing models in both detecting and generating scam content based on the PTs used by real-world scammers.
arXiv Detail & Related papers (2025-05-21T01:55:04Z) - ScamFerret: Detecting Scam Websites Autonomously with Large Language Models [2.6217304977339473]
ScamFerret is an innovative agent system employing a large language model (LLM) to autonomously collect and analyze data from a given URL to determine whether it is a scam.<n>Our evaluation demonstrated that ScamFerret achieves 0.972 accuracy in classifying four scam types in English and 0.993 accuracy in classifying online shopping websites across three different languages.
arXiv Detail & Related papers (2025-02-14T12:16:38Z) - Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance [16.9071617169937]
This paper investigates the vulnerabilities of Large Language Models (LLMs) when facing adversarial scam messages for the task of scam detection.<n>We created a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages.<n>Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate.
arXiv Detail & Related papers (2024-12-01T00:13:28Z) - Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions [89.35345649303451]
Generative search engines have the potential to transform how people seek information online.
But generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate.
Retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system.
arXiv Detail & Related papers (2024-02-25T11:22:19Z) - Detecting Scams Using Large Language Models [19.7220607313348]
Large Language Models (LLMs) have gained prominence in various applications, including security.
This paper explores the utility of LLMs in scam detection, a critical aspect of cybersecurity.
We propose a novel use case for LLMs to identify scams, such as phishing, advance fee fraud, and romance scams.
arXiv Detail & Related papers (2024-02-05T16:13:54Z) - User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search.
We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results.
More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z) - Evaluating Verifiability in Generative Search Engines [70.59477647085387]
Generative search engines directly generate responses to user queries, along with in-line citations.
We conduct human evaluation to audit four popular generative search engines.
We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations.
arXiv Detail & Related papers (2023-04-19T17:56:12Z) - Tainted Love: A Systematic Review of Online Romance Fraud [68.8204255655161]
Romance fraud involves cybercriminals engineering a romantic relationship on online dating platforms.
We characterise the literary landscape on romance fraud, advancing the understanding of researchers and practitioners.
Three main contributions were identified: profiles of romance scams, countermeasures for mitigating romance scams, and factors that predispose an individual to become a scammer or a victim.
arXiv Detail & Related papers (2023-02-28T20:34:07Z) - Recent trends in Social Engineering Scams and Case study of Gift Card
Scam [4.345672405192058]
Social engineering scams (SES) has been existed since the adoption of the telecommunications by humankind.
Recent trends of various social engineering scams targeting the innocent people all over the world.
Case study of real-time gift card scam targeting various enterprise organization customers.
arXiv Detail & Related papers (2021-10-13T04:17:02Z) - DFraud3- Multi-Component Fraud Detection freeof Cold-start [50.779498955162644]
The Cold-start is a significant problem referring to the failure of a detection system to recognize the authenticity of a new user.
In this paper, we model a review system as a Heterogeneous InformationNetwork (HIN) which enables a unique representation to every component.
HIN with graph induction helps to address the camouflage issue (fraudsterswith genuine reviews) which has shown to be more severe when it is coupled with cold-start, i.e., new fraudsters with genuine first reviews.
arXiv Detail & Related papers (2020-06-10T08:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.