Related papers: Detecting Content Rating Violations in Android Applications: A Vision-Language Approach

Detecting Content Rating Violations in Android Applications: A Vision-Language Approach

URL: http://arxiv.org/abs/2502.15739v1
Date: Fri, 07 Feb 2025 06:21:43 GMT
Title: Detecting Content Rating Violations in Android Applications: A Vision-Language Approach
Authors: D. Denipitiyage, B. Silva, S. Seneviratne, A. Seneviratne, S. Chawla,
Abstract summary: We propose and evaluate a vision approach to predict the content ratings of mobile game applications and detect content rating violations.<n>Our method achieves 6% better relative accuracy compared to the state-of-the-art CLIP-fine-tuned model in a multi-modal setting.<n>Applying our classifier in the wild, we detected more than 70 possible cases of content rating violations, including nine instances with the 'Teacher Approved' badge.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite regulatory efforts to establish reliable content-rating guidelines for mobile apps, the process of assigning content ratings in the Google Play Store remains self-regulated by the app developers. There is no straightforward method of verifying developer-assigned content ratings manually due to the overwhelming scale or automatically due to the challenging problem of interpreting textual and visual data and correlating them with content ratings. We propose and evaluate a visionlanguage approach to predict the content ratings of mobile game applications and detect content rating violations, using a dataset of metadata of popular Android games. Our method achieves ~6% better relative accuracy compared to the state-of-the-art CLIP-fine-tuned model in a multi-modal setting. Applying our classifier in the wild, we detected more than 70 possible cases of content rating violations, including nine instances with the 'Teacher Approved' badge. Additionally, our findings indicate that 34.5% of the apps identified by our classifier as violating content ratings were removed from the Play Store. In contrast, the removal rate for correctly classified apps was only 27%. This discrepancy highlights the practical effectiveness of our classifier in identifying apps that are likely to be removed based on user complaints.

Related papers

Moderately Mighty: To What Extent Can Internal Software Metrics Predict App Popularity at Launch? [8.230804322052956]
This study explores the extent to which internal software metrics, measurable from source code before deployment, can predict an app's popularity.<n>We constructed a rigorously filtered dataset of 446 open-source Java-based Android apps available on both F-Droid and Google Play Store.<n>We conclude that, although internal code metrics alone are insufficient for accurately predicting an app's future popularity, they do exhibit meaningful correlations with it.
arXiv Detail & Related papers (2025-07-02T19:47:41Z)
Long-Form Information Alignment Evaluation Beyond Atomic Facts [60.25969380388974]
We introduce MontageLie, a benchmark that constructs deceptive narratives by "montaging" truthful statements without introducing explicit hallucinations.<n>We propose DoveScore, a novel framework that jointly verifies factual accuracy and event-order consistency.
arXiv Detail & Related papers (2025-05-21T17:46:38Z)
Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps [9.948068408730654]
This research aims to investigate fairness concerns raised in mobile app reviews. Our research focuses on AI-based mobile app reviews as the chance of unfair behaviors and outcomes in AI-based apps may be higher than in non-AI-based apps.
arXiv Detail & Related papers (2024-01-16T03:43:33Z)
Revisiting Android App Categorization [5.805764439228492]
We present a comprehensive evaluation of existing Android app categorization approaches using our new ground-truth dataset. We propose two innovative approaches that effectively outperform the performance of existing methods in both description-based and APK-based methodologies.
arXiv Detail & Related papers (2023-10-11T08:25:34Z)
Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence. We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers. Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Content Rating Classification for Fan Fiction [35.539218522504605]
Fan fiction content ratings are done voluntarily or required by regulation. The problem is to take fan fiction text and determine the appropriate content rating. We propose natural language processing techniques to automatically determine the content rating.
arXiv Detail & Related papers (2022-12-23T17:40:03Z)
Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers. To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed. There are no standard procedure for using this ranking information and Area Chairs may use it in different ways. We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z)
Erasing Labor with Labor: Dark Patterns and Lockstep Behaviors on Google Play [13.658284581863839]
Google Play's policy forbids the use of incentivized installs, ratings, and reviews to manipulate the placement of apps. We examine install-incentivizing apps through a socio-technical lens and perform a mixed-methods analysis of their reviews and permissions. Our dataset contains 319K reviews collected daily over five months from 60 such apps that cumulatively account for over 160.5M installs. We find evidence of fraudulent reviews on install-incentivizing apps, following which we model them as an edge stream in a dynamic bipartite graph of apps and reviewers.
arXiv Detail & Related papers (2022-02-09T16:54:27Z)
Android Malware Detection using Feature Ranking of Permissions [0.0]
We use Android permissions as a vehicle to allow for quick and effective differentiation between benign and malware apps. Our analysis indicates that this approach can result in better accuracy and F-score value than other reported approaches.
arXiv Detail & Related papers (2022-01-20T22:08:20Z)
Instance-Conditional Knowledge Distillation for Object Detection [59.56780046291835]
We propose an instance-conditional distillation framework to find desired knowledge. We use observed instances as condition information and formulate the retrieval process as an instance-conditional decoding process.
arXiv Detail & Related papers (2021-10-25T08:23:29Z)
Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection [60.88952532574564]
This paper conducts a thorough comparison of out-of-domain intent detection methods. We evaluate multiple contextual encoders and methods, proven to be efficient, on three standard datasets for intent classification. Our main findings show that fine-tuning Transformer-based encoders on in-domain data leads to superior results.
arXiv Detail & Related papers (2021-01-11T09:10:58Z)
ScoreGAN: A Fraud Review Detector based on Multi Task Learning of Regulated GAN with Data Augmentation [50.779498955162644]
We propose ScoreGAN for fraud review detection that makes use of both review text and review rating scores in the generation and detection process. Results show that the proposed framework outperformed the existing state-of-the-art framework, namely FakeGAN, in terms of AP by 7%, and 5% on the Yelp and TripAdvisor datasets.
arXiv Detail & Related papers (2020-06-11T16:15:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.