A Multi-modal Neural Embeddings Approach for Detecting Mobile
Counterfeit Apps: A Case Study on Google Play Store
- URL: http://arxiv.org/abs/2006.02231v1
- Date: Tue, 2 Jun 2020 07:10:21 GMT
- Title: A Multi-modal Neural Embeddings Approach for Detecting Mobile
Counterfeit Apps: A Case Study on Google Play Store
- Authors: Naveen Karunanayake, Jathushan Rajasegaran, Ashanie Gunathillake,
Suranga Seneviratne, Guillaume Jourjon
- Abstract summary: This paper proposes to leverage the recent advances in deep learning methods to create image and text embeddings.
We show that a novel approach of combining content embeddings and style embeddings outperforms the baseline methods for image similarity.
We present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-10,000 popular apps.
- Score: 4.5170827242233145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Counterfeit apps impersonate existing popular apps in attempts to misguide
users to install them for various reasons such as collecting personal
information or spreading malware. Many counterfeits can be identified once
installed, however even a tech-savvy user may struggle to detect them before
installation. To this end, this paper proposes to leverage the recent advances
in deep learning methods to create image and text embeddings so that
counterfeit apps can be efficiently identified when they are submitted for
publication. We show that a novel approach of combining content embeddings and
style embeddings outperforms the baseline methods for image similarity such as
SIFT, SURF, and various image hashing methods. We first evaluate the
performance of the proposed method on two well-known datasets for evaluating
image similarity methods and show that content, style, and combined embeddings
increase precision@k and recall@k by 10%-15% and 12%-25%, respectively when
retrieving five nearest neighbours. Second, specifically for the app
counterfeit detection problem, combined content and style embeddings achieve
12% and 14% increase in precision@k and recall@k, respectively compared to the
baseline methods. Third, we present an analysis of approximately 1.2 million
apps from Google Play Store and identify a set of potential counterfeits for
top-10,000 popular apps. Under a conservative assumption, we were able to find
2,040 potential counterfeits that contain malware in a set of 49,608 apps that
showed high similarity to one of the top-10,000 popular apps in Google Play
Store. We also find 1,565 potential counterfeits asking for at least five
additional dangerous permissions than the original app and 1,407 potential
counterfeits having at least five extra third party advertisement libraries.
Related papers
- Detecting and Characterising Mobile App Metamorphosis in Google Play Store [0.0]
We propose a novel and efficient multi-modal search methodology to identify apps undergoing metamorphosis.
Our methodology uncovers various metamorphosis scenarios, including re-births, re-branding, re-purposing, and others.
We shed light on the concealed security and privacy risks that lurk within, potentially impacting even tech-savvy end-users.
arXiv Detail & Related papers (2024-07-19T03:26:40Z) - A Risk Estimation Study of Native Code Vulnerabilities in Android Applications [1.6078134198754157]
We propose a fast risk-based approach that provides a risk score related to the native part of an Android application.
We show that many applications contain well-known vulnerabilities that miscreants can potentially exploit.
arXiv Detail & Related papers (2024-06-04T06:44:07Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - Pattern Spotting and Image Retrieval in Historical Documents using Deep
Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents.
Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations.
The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - Erasing Labor with Labor: Dark Patterns and Lockstep Behaviors on Google
Play [13.658284581863839]
Google Play's policy forbids the use of incentivized installs, ratings, and reviews to manipulate the placement of apps.
We examine install-incentivizing apps through a socio-technical lens and perform a mixed-methods analysis of their reviews and permissions.
Our dataset contains 319K reviews collected daily over five months from 60 such apps that cumulatively account for over 160.5M installs.
We find evidence of fraudulent reviews on install-incentivizing apps, following which we model them as an edge stream in a dynamic bipartite graph of apps and reviewers.
arXiv Detail & Related papers (2022-02-09T16:54:27Z) - A high performance fingerprint liveness detection method based on
quality related features [66.41574316136379]
The system is tested on a highly challenging database comprising over 10,500 real and fake images.
The proposed solution proves to be robust to the multi-scenario dataset, and presents an overall rate of 90% correctly classified samples.
arXiv Detail & Related papers (2021-11-02T21:09:39Z) - An Effective and Robust Detector for Logo Detection [58.448716977297565]
Some attackers fool the well-trained logo detection model for infringement.
A novel logo detector based on the mechanism of looking and thinking twice is proposed in this paper.
We extend detectoRS algorithm to a cascade schema with an equalization loss function, multi-scale transformations, and adversarial data augmentation.
arXiv Detail & Related papers (2021-08-01T10:17:53Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z) - A Note on Deepfake Detection with Low-Resources [0.0]
Deepfakes are videos that include changes, quite often substituting face of a portrayed individual with a different face using neural networks.
We present two methods that allow detecting Deepfakes for a user without significant computational power.
arXiv Detail & Related papers (2020-06-09T11:07:08Z) - Source Printer Identification from Document Images Acquired using
Smartphone [14.889347839830092]
We propose to learn a single CNN model from the fusion of letter images and their printer-specific noise residuals.
The proposed method achieves 98.42% document classification accuracy using images of letter 'e' under a 5x2 cross-validation approach.
arXiv Detail & Related papers (2020-03-27T18:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.