Related papers: Data augmentation and refinement for recommender system: A semi-supervised approach using maximum margin matrix factorization

Data augmentation and refinement for recommender system: A semi-supervised approach using maximum margin matrix factorization

URL: http://arxiv.org/abs/2306.13050v3
Date: Sat, 30 Sep 2023 07:30:10 GMT
Title: Data augmentation and refinement for recommender system: A semi-supervised approach using maximum margin matrix factorization
Authors: Shamal Shaikh, Venkateswara Rao Kagita, Vikas Kumar, Arun K Pujari
Abstract summary: We explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF) for rating predictions. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings. We propose a semi-supervised approach for rating augmentation based on self-training.
Score: 3.3525248693617207
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Collaborative filtering (CF) has become a popular method for developing recommender systems (RSs) where ratings of a user for new items are predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for rating predictions, which has not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.

Related papers

Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation [6.4212082894269535]
We compare existing leakage detection techniques, namely permutation and n-gram-based methods.<n>Our analysis shows that the n-gram method consistently achieves the highest F1-score.<n>We create cleaned versions of MMLU and HellaSwag, and re-evaluate several LLMs.
arXiv Detail & Related papers (2025-05-30T06:37:39Z)
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF [67.48004037550064]
We propose an active learning approach to efficiently select prompt and preference pairs. Our method evaluates the gradients of all potential preference annotations to assess their impact on model updates. Experimental results demonstrate that our method outperforms the baseline by up to 5% in win rates against the chosen completion.
arXiv Detail & Related papers (2025-03-28T04:22:53Z)
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. We propose a method called Stratified Prediction-Powered Inference (StratPPI) We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z)
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks [58.469818546042696]
We study the sample efficiency of OPE with human preference and establish a statistical guarantee for it. By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2023-10-16T16:27:06Z)
Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients. FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z)
Experimenting with an Evaluation Framework for Imbalanced Data Learning (EFIDL) [9.010643838773477]
Data imbalance is one of the crucial issues in big data analysis with fewer labels. Many data balance methods were introduced to improve machine learning algorithms' performance. We proposed, a new evaluation framework for imbalanced data learning methods.
arXiv Detail & Related papers (2023-01-26T01:16:02Z)
User-Specific Bicluster-based Collaborative Filtering: Handling Preference Locality, Sparsity and Subjectivity [1.0398909602421018]
Collaborative Filtering (CF) is the most common approach to build Recommender Systems. We propose USBFC, a Biclustering-based CF approach that creates user-specific models from strongly coherent and statistically significant rating patterns. USBFC achieves competitive predictive accuracy against state-of-the-art CF methods.
arXiv Detail & Related papers (2022-11-15T18:10:52Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Unsupervised learning of disentangled representations in deep restricted kernel machines with orthogonality constraints [15.296955630621566]
Constr-DRKM is a deep kernel method for the unsupervised learning of disentangled data representations. We quantitatively evaluate the proposed method's effectiveness in disentangled feature learning.
arXiv Detail & Related papers (2020-11-25T11:40:10Z)
Providing reliability in Recommender Systems through Bernoulli Matrix Factorization [63.732639864601914]
This paper proposes Bernoulli Matrix Factorization (BeMF) to provide both prediction values and reliability values. BeMF acts on model-based collaborative filtering rather than on memory-based filtering. The more reliable a prediction is, the less liable it is to be wrong.
arXiv Detail & Related papers (2020-06-05T14:24:27Z)
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.