More Insight from Being More Focused: Analysis of Clustered Market Apps
- URL: http://arxiv.org/abs/2405.15737v1
- Date: Fri, 24 May 2024 17:34:06 GMT
- Title: More Insight from Being More Focused: Analysis of Clustered Market Apps
- Authors: Maleknaz Nayebi, Homayoon Farrahi, Ada Lee, Henry Cho, Guenther Ruhe,
- Abstract summary: Current research studies mostly consider sampling across all apps.
Similar to proprietary software and web-based services, more specific results can be expected from looking at more homogeneous samples.
In this paper, we target homogeneous samples of apps to increase the degree of insight gained from analytics.
- Score: 2.66221280030096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing attraction of mobile apps has inspired researchers to analyze apps from different perspectives. As with any software product, apps have different attributes such as size, content maturity, rating, category, or number of downloads. Current research studies mostly consider sampling across all apps. This often results in comparisons of apps being quite different in nature and category (games compared with weather and calendar apps), also being different in size and complexity. Similar to proprietary software and web-based services, more specific results can be expected from looking at more homogeneous samples as they can be received as a result of applying clustering. In this paper, we target homogeneous samples of apps to increase the degree of insight gained from analytics. As a proof-of-concept, we applied the clustering technique DBSCAN and subsequent correlation analysis between app attributes for a set of 940 open-source mobile apps from F-Droid. We showed that (i) clusters of apps with similar characteristics provided more insight compared to applying the same to the whole data and (ii) defining the similarity of apps based on the similarity of topics as created from the topic modeling technique Latent Dirichlet Allocation does not significantly improve clustering results.
Related papers
- The Impact of Train-Test Leakage on Machine Learning-based Android Malware Detection [6.9053043489744015]
We identify distinct Android apps that have identical or nearly identical app representations.
This will lead to a data leakage problem that inflates a machine learning model's performance.
We propose a leak-aware scheme to construct a machine learning-based Android malware detector.
arXiv Detail & Related papers (2024-10-25T08:04:01Z) - Bayesian Joint Additive Factor Models for Multiview Learning [7.254731344123118]
A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes.
We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components.
Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors.
arXiv Detail & Related papers (2024-06-02T15:35:45Z) - DealMVC: Dual Contrastive Calibration for Multi-view Clustering [78.54355167448614]
We propose a novel Dual contrastive calibration network for Multi-View Clustering (DealMVC)
We first design a fusion mechanism to obtain a global cross-view feature. Then, a global contrastive calibration loss is proposed by aligning the view feature similarity graph and the high-confidence pseudo-label graph.
During the training procedure, the interacted cross-view feature is jointly optimized at both local and global levels.
arXiv Detail & Related papers (2023-08-17T14:14:28Z) - Automatically Classifying Emotions based on Text: A Comparative
Exploration of Different Datasets [0.0]
We focus on three datasets that were recently presented in the related literature.
We explore the performance of traditional as well as state-of-the-art deep learning models in the presence of different characteristics in the data.
Our experimental work shows that state-of-the-art models such as RoBERTa perform the best for all cases.
arXiv Detail & Related papers (2023-02-28T16:34:55Z) - When to Use What: An In-Depth Comparative Empirical Analysis of OpenIE
Systems for Downstream Applications [0.0]
We present an application-focused empirical survey of neural OpenIE models, training sets, and benchmarks.
We find that the different assumptions made by different models and datasets have a statistically significant effect on performance.
arXiv Detail & Related papers (2022-11-15T15:48:27Z) - OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning [53.57075147367114]
We introduce OpenMixup, the first mixup augmentation and benchmark for visual representation learning.
We train 18 representative mixup baselines from scratch and rigorously evaluate them across 11 image datasets.
We also open-source our modular backbones, including a collection of popular vision backbones, optimization strategies, and analysis toolkits.
arXiv Detail & Related papers (2022-09-11T12:46:01Z) - Multi-Domain Joint Training for Person Re-Identification [51.73921349603597]
Deep learning-based person Re-IDentification (ReID) often requires a large amount of training data to achieve good performance.
It appears that collecting more training data from diverse environments tends to improve the ReID performance.
We propose an approach called Domain-Camera-Sample Dynamic network (DCSD) whose parameters can be adaptive to various factors.
arXiv Detail & Related papers (2022-01-06T09:20:59Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Closing the Generalization Gap in One-Shot Object Detection [92.82028853413516]
We show that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead in scaling the number of categories.
Future data annotation efforts should therefore focus on wider datasets and annotate a larger number of categories.
arXiv Detail & Related papers (2020-11-09T09:31:17Z) - Dataset Bias in Few-shot Image Recognition [57.25445414402398]
We first investigate the impact of transferable capabilities learned from base categories.
Second, we investigate performance differences on different datasets from dataset structures and different few-shot learning methods.
arXiv Detail & Related papers (2020-08-18T14:46:23Z) - The OARF Benchmark Suite: Characterization and Implications for
Federated Learning Systems [41.90546696412147]
Open Application Repository for Federated Learning (OARF) is a benchmark suite for federated machine learning systems.
OARF mimics more realistic application scenarios with publicly available data sets as different data silos in image, text and structured data.
arXiv Detail & Related papers (2020-06-14T10:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.