Data augmentation and refinement for recommender system: A
semi-supervised approach using maximum margin matrix factorization
- URL: http://arxiv.org/abs/2306.13050v3
- Date: Sat, 30 Sep 2023 07:30:10 GMT
- Title: Data augmentation and refinement for recommender system: A
semi-supervised approach using maximum margin matrix factorization
- Authors: Shamal Shaikh, Venkateswara Rao Kagita, Vikas Kumar, Arun K Pujari
- Abstract summary: We explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF) for rating predictions.
We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings.
We propose a semi-supervised approach for rating augmentation based on self-training.
- Score: 3.3525248693617207
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Collaborative filtering (CF) has become a popular method for developing
recommender systems (RSs) where ratings of a user for new items are predicted
based on her past preferences and available preference information of other
users. Despite the popularity of CF-based methods, their performance is often
greatly limited by the sparsity of observed entries. In this study, we explore
the data augmentation and refinement aspects of Maximum Margin Matrix
Factorization (MMMF), a widely accepted CF technique for rating predictions,
which has not been investigated before. We exploit the inherent characteristics
of CF algorithms to assess the confidence level of individual ratings and
propose a semi-supervised approach for rating augmentation based on
self-training. We hypothesize that any CF algorithm's predictions with low
confidence are due to some deficiency in the training data and hence, the
performance of the algorithm can be improved by adopting a systematic data
augmentation strategy. We iteratively use some of the ratings predicted with
high confidence to augment the training data and remove low-confidence entries
through a refinement process. By repeating this process, the system learns to
improve prediction accuracy. Our method is experimentally evaluated on several
state-of-the-art CF algorithms and leads to informative rating augmentation,
improving the performance of the baseline approaches.
Related papers
- Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Sample Complexity of Preference-Based Nonparametric Off-Policy
Evaluation with Deep Networks [58.469818546042696]
We study the sample efficiency of OPE with human preference and establish a statistical guarantee for it.
By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2023-10-16T16:27:06Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Experimenting with an Evaluation Framework for Imbalanced Data Learning
(EFIDL) [9.010643838773477]
Data imbalance is one of the crucial issues in big data analysis with fewer labels.
Many data balance methods were introduced to improve machine learning algorithms' performance.
We proposed, a new evaluation framework for imbalanced data learning methods.
arXiv Detail & Related papers (2023-01-26T01:16:02Z) - User-Specific Bicluster-based Collaborative Filtering: Handling
Preference Locality, Sparsity and Subjectivity [1.0398909602421018]
Collaborative Filtering (CF) is the most common approach to build Recommender Systems.
We propose USBFC, a Biclustering-based CF approach that creates user-specific models from strongly coherent and statistically significant rating patterns.
USBFC achieves competitive predictive accuracy against state-of-the-art CF methods.
arXiv Detail & Related papers (2022-11-15T18:10:52Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Unsupervised learning of disentangled representations in deep restricted
kernel machines with orthogonality constraints [15.296955630621566]
Constr-DRKM is a deep kernel method for the unsupervised learning of disentangled data representations.
We quantitatively evaluate the proposed method's effectiveness in disentangled feature learning.
arXiv Detail & Related papers (2020-11-25T11:40:10Z) - Providing reliability in Recommender Systems through Bernoulli Matrix
Factorization [63.732639864601914]
This paper proposes Bernoulli Matrix Factorization (BeMF) to provide both prediction values and reliability values.
BeMF acts on model-based collaborative filtering rather than on memory-based filtering.
The more reliable a prediction is, the less liable it is to be wrong.
arXiv Detail & Related papers (2020-06-05T14:24:27Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.