Related papers: A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store

A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store

URL: http://arxiv.org/abs/2006.02231v1
Date: Tue, 2 Jun 2020 07:10:21 GMT
Title: A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store
Authors: Naveen Karunanayake, Jathushan Rajasegaran, Ashanie Gunathillake, Suranga Seneviratne, Guillaume Jourjon
Abstract summary: This paper proposes to leverage the recent advances in deep learning methods to create image and text embeddings. We show that a novel approach of combining content embeddings and style embeddings outperforms the baseline methods for image similarity. We present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-10,000 popular apps.
Score: 4.5170827242233145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Counterfeit apps impersonate existing popular apps in attempts to misguide users to install them for various reasons such as collecting personal information or spreading malware. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle to detect them before installation. To this end, this paper proposes to leverage the recent advances in deep learning methods to create image and text embeddings so that counterfeit apps can be efficiently identified when they are submitted for publication. We show that a novel approach of combining content embeddings and style embeddings outperforms the baseline methods for image similarity such as SIFT, SURF, and various image hashing methods. We first evaluate the performance of the proposed method on two well-known datasets for evaluating image similarity methods and show that content, style, and combined embeddings increase precision@k and recall@k by 10%-15% and 12%-25%, respectively when retrieving five nearest neighbours. Second, specifically for the app counterfeit detection problem, combined content and style embeddings achieve 12% and 14% increase in precision@k and recall@k, respectively compared to the baseline methods. Third, we present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-10,000 popular apps. Under a conservative assumption, we were able to find 2,040 potential counterfeits that contain malware in a set of 49,608 apps that showed high similarity to one of the top-10,000 popular apps in Google Play Store. We also find 1,565 potential counterfeits asking for at least five additional dangerous permissions than the original app and 1,407 potential counterfeits having at least five extra third party advertisement libraries.

Related papers

Scaling Up: Revisiting Mining Android Sandboxes at Scale for Malware Classification [1.6445041519310273]
Mining Android Sandbox approach (MAS) aims to identify malicious behavior in repackaged Android applications (apps)<n>Previous empirical studies evaluated the MAS approach using a small dataset consisting of only 102 pairs of original and repackaged apps.<n>This paper presents the results of a replication study focused on evaluating the performance of the MAS approach regarding its capabilities of correctly classifying malware from different families.
arXiv Detail & Related papers (2025-05-14T15:52:09Z)
Enhancing Malware Fingerprinting through Analysis of Evasive Techniques [15.037069167445846]
We analyze 4 million Windows Portable Executable (PE) files, 21 million sections, and 48 million resources. We find up to 80% deep structural similarities, including common APIs and executable sections. Our analysis reveals non-functional mutations, such as altered section numbers, virtual sizes, and section names, as primary evasion tactics.
arXiv Detail & Related papers (2025-03-09T07:41:49Z)
Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks [72.4498910775871]
Vision-language model (VLM)-based retrievers leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods. In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers.
arXiv Detail & Related papers (2025-01-28T12:40:37Z)
LLM App Squatting and Cloning [12.626589260776404]
Impersonation tactics, such as app squatting and app cloning, have posed longstanding challenges in mobile app stores. We present the first large-scale analysis of LLM app squatting and cloning using our custom-built tool, LLMappCrazy.
arXiv Detail & Related papers (2024-11-12T03:32:30Z)
Detecting and Characterising Mobile App Metamorphosis in Google Play Store [0.0]
We propose a novel and efficient multi-modal search methodology to identify apps undergoing metamorphosis. Our methodology uncovers various metamorphosis scenarios, including re-births, re-branding, re-purposing, and others. We shed light on the concealed security and privacy risks that lurk within, potentially impacting even tech-savvy end-users.
arXiv Detail & Related papers (2024-07-19T03:26:40Z)
A Risk Estimation Study of Native Code Vulnerabilities in Android Applications [1.6078134198754157]
We propose a fast risk-based approach that provides a risk score related to the native part of an Android application. We show that many applications contain well-known vulnerabilities that miscreants can potentially exploit.
arXiv Detail & Related papers (2024-06-04T06:44:07Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z)
Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework. We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models. We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z)
Erasing Labor with Labor: Dark Patterns and Lockstep Behaviors on Google Play [13.658284581863839]
Google Play's policy forbids the use of incentivized installs, ratings, and reviews to manipulate the placement of apps. We examine install-incentivizing apps through a socio-technical lens and perform a mixed-methods analysis of their reviews and permissions. Our dataset contains 319K reviews collected daily over five months from 60 such apps that cumulatively account for over 160.5M installs. We find evidence of fraudulent reviews on install-incentivizing apps, following which we model them as an edge stream in a dynamic bipartite graph of apps and reviewers.
arXiv Detail & Related papers (2022-02-09T16:54:27Z)
A high performance fingerprint liveness detection method based on quality related features [66.41574316136379]
The system is tested on a highly challenging database comprising over 10,500 real and fake images. The proposed solution proves to be robust to the multi-scenario dataset, and presents an overall rate of 90% correctly classified samples.
arXiv Detail & Related papers (2021-11-02T21:09:39Z)
An Effective and Robust Detector for Logo Detection [58.448716977297565]
Some attackers fool the well-trained logo detection model for infringement. A novel logo detector based on the mechanism of looking and thinking twice is proposed in this paper. We extend detectoRS algorithm to a cascade schema with an equalization loss function, multi-scale transformations, and adversarial data augmentation.
arXiv Detail & Related papers (2021-08-01T10:17:53Z)
Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)
A Note on Deepfake Detection with Low-Resources [0.0]
Deepfakes are videos that include changes, quite often substituting face of a portrayed individual with a different face using neural networks. We present two methods that allow detecting Deepfakes for a user without significant computational power.
arXiv Detail & Related papers (2020-06-09T11:07:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.