Graph-Based Model-Agnostic Data Subsampling for Recommendation Systems
- URL: http://arxiv.org/abs/2305.16391v2
- Date: Fri, 16 Jun 2023 05:38:39 GMT
- Title: Graph-Based Model-Agnostic Data Subsampling for Recommendation Systems
- Authors: Xiaohui Chen, Jiankai Sun, Taiqing Wang, Ruocheng Guo, Li-Ping Liu,
Aonan Zhang
- Abstract summary: Data subsampling is widely used to speed up the training of recommendation systems.
Most subsampling methods are model-based and often require a pre-trained pilot model to measure data importance.
We propose model-agnostic data subsampling methods by only exploring input data structure represented by graphs.
- Score: 29.713557081485995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data subsampling is widely used to speed up the training of large-scale
recommendation systems. Most subsampling methods are model-based and often
require a pre-trained pilot model to measure data importance via e.g. sample
hardness. However, when the pilot model is misspecified, model-based
subsampling methods deteriorate. Since model misspecification is persistent in
real recommendation systems, we instead propose model-agnostic data subsampling
methods by only exploring input data structure represented by graphs.
Specifically, we study the topology of the user-item graph to estimate the
importance of each user-item interaction (an edge in the user-item graph) via
graph conductance, followed by a propagation step on the network to smooth out
the estimated importance value.
Since our proposed method is model-agnostic, we can marry the merits of both
model-agnostic and model-based subsampling methods. Empirically, we show that
combing the two consistently improves over any single method on the used
datasets.
Experimental results on KuaiRec and MIND datasets demonstrate that our
proposed methods achieve superior results compared to baseline approaches.
Related papers
- A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals [1.2277343096128712]
This article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model.
Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models.
arXiv Detail & Related papers (2023-11-13T17:08:34Z) - Dual Student Networks for Data-Free Model Stealing [79.67498803845059]
Two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples.
We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on.
We show that our new optimization framework provides more accurate gradient estimation of the target model and better accuracies on benchmark classification datasets.
arXiv Detail & Related papers (2023-09-18T18:11:31Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - MissDiff: Training Diffusion Models on Tabular Data with Missing Values [29.894691645801597]
This work presents a unified and principled diffusion-based framework for learning from data with missing values.
We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective.
We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases.
arXiv Detail & Related papers (2023-07-02T03:49:47Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Deep Explainable Learning with Graph Based Data Assessing and Rule
Reasoning [4.369058206183195]
We propose an end-to-end deep explainable learning approach that combines the advantage of deep model in noise handling and expert rule-based interpretability.
The proposed method is tested in an industry production system, showing comparable prediction accuracy, much higher generalization stability and better interpretability.
arXiv Detail & Related papers (2022-11-09T05:58:56Z) - Model-free Subsampling Method Based on Uniform Designs [5.661822729320697]
We develop a low-GEFD data-driven subsampling method based on the existing uniform designs.
Our method keeps robust under diverse model specifications while other popular subsampling methods are under-performing.
arXiv Detail & Related papers (2022-09-08T07:47:56Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - A Bayesian Approach with Type-2 Student-tMembership Function for T-S
Model Identification [47.25472624305589]
fuzzyc-regression clustering based on type-2 fuzzyset has been shown the remarkable results on non-sparse data.
Aninnovative architecture for fuzzyc-regression model is presented and a novel student-tdistribution based membership functionis designed for sparse data modelling.
arXiv Detail & Related papers (2020-09-02T05:10:13Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.