Online and Offline Evaluations of Collaborative Filtering and Content Based Recommender Systems
- URL: http://arxiv.org/abs/2411.01354v1
- Date: Sat, 02 Nov 2024 20:05:31 GMT
- Title: Online and Offline Evaluations of Collaborative Filtering and Content Based Recommender Systems
- Authors: Ali Elahi, Armin Zirak,
- Abstract summary: This study provides a comparative analysis of a large-scale recommender system operating in Iran.
The system employs user-based and item-based recommendations using content-based, collaborative filtering, trend-based methods, and hybrid approaches.
Our methods of evaluation include manual evaluation, offline tests including accuracy and ranking metrics like hit-rate@k and nDCG, and online tests consisting of click-through rate (CTR)
- Score: 0.0
- License:
- Abstract: Recommender systems are widely used AI applications designed to help users efficiently discover relevant items. The effectiveness of such systems is tied to the satisfaction of both users and providers. However, user satisfaction is complex and cannot be easily framed mathematically using information retrieval and accuracy metrics. While many studies evaluate accuracy through offline tests, a growing number of researchers argue that online evaluation methods such as A/B testing are better suited for this purpose. We have employed a variety of algorithms on different types of datasets divergent in size and subject, producing recommendations in various platforms, including media streaming services, digital publishing websites, e-commerce systems, and news broadcasting networks. Notably, our target websites and datasets are in Persian (Farsi) language. This study provides a comparative analysis of a large-scale recommender system that has been operating for the past year across about 70 websites in Iran, processing roughly 300 requests per second collectively. The system employs user-based and item-based recommendations using content-based, collaborative filtering, trend-based methods, and hybrid approaches. Through both offline and online evaluations, we aim to identify where these algorithms perform most efficiently and determine the best method for our specific needs, considering the dataset and system scale. Our methods of evaluation include manual evaluation, offline tests including accuracy and ranking metrics like hit-rate@k and nDCG, and online tests consisting of click-through rate (CTR). Additionally we analyzed and proposed methods to address cold-start and popularity bias.
Related papers
- Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - A Comprehensive Survey of Evaluation Techniques for Recommendation
Systems [0.0]
This paper introduces a comprehensive suite of metrics, each tailored to capture a distinct aspect of system performance.
We identify the strengths and limitations of current evaluation practices and highlight the nuanced trade-offs that emerge when optimizing recommendation systems across different metrics.
arXiv Detail & Related papers (2023-12-26T11:57:01Z) - Embedding in Recommender Systems: A Survey [67.67966158305603]
A crucial aspect is embedding techniques that covert the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors.
Applying embedding techniques captures complex entity relationships and has spurred substantial research.
This survey covers embedding methods like collaborative filtering, self-supervised learning, and graph-based techniques.
arXiv Detail & Related papers (2023-10-28T06:31:06Z) - Bridging Offline-Online Evaluation with a Time-dependent and Popularity
Bias-free Offline Metric for Recommenders [3.130722489512822]
We show that penalizing popular items and considering the time of transactions significantly improves our ability to choose the best recommendation model for a live recommender system.
Our results aim to help the academic community to understand better offline evaluation and optimization criteria that are more relevant for real applications of recommender systems.
arXiv Detail & Related papers (2023-08-14T01:37:02Z) - Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of
Simulation [11.940733431087102]
In academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems.
Online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures.
In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods.
arXiv Detail & Related papers (2022-09-18T20:03:32Z) - A Comprehensive Review on Non-Neural Networks Collaborative Filtering
Recommendation Systems [1.3124513975412255]
Collaborative filtering (CF) uses the known preference of a group of users to make predictions and recommendations about the unknown preferences of other users.
First introduced in the 1990s, a wide variety of increasingly successful models have been proposed.
Due to the success of machine learning techniques in many areas, there has been a growing emphasis on the application of such algorithms in recommendation systems.
arXiv Detail & Related papers (2021-06-20T11:13:33Z) - Benchmarks for Deep Off-Policy Evaluation [152.28569758144022]
We present a collection of policies that can be used for benchmarking off-policy evaluation.
The goal of our benchmark is to provide a standardized measure of progress that is motivated from a set of principles.
We provide open-source access to our data and code to foster future research in this area.
arXiv Detail & Related papers (2021-03-30T18:09:33Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - Recommendation system using a deep learning and graph analysis approach [1.2183405753834562]
We propose a novel recommendation method based on Matrix Factorization and graph analysis methods.
In addition, we leverage deep Autoencoders to initialize users and items latent factors, and deep embedding method gathers users' latent factors from the user trust graph.
arXiv Detail & Related papers (2020-04-17T08:05:33Z) - PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative
Dialogue Systems [48.99561874529323]
There are three kinds of automatic methods to evaluate the open-domain generative dialogue systems.
Due to the lack of systematic comparison, it is not clear which kind of metrics are more effective.
We propose a novel and feasible learning-based metric that can significantly improve the correlation with human judgments.
arXiv Detail & Related papers (2020-04-06T04:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.