Related papers: Auditing Yelp's Business Ranking and Review Recommendation Through the Lens of Fairness

Auditing Yelp's Business Ranking and Review Recommendation Through the Lens of Fairness

URL: http://arxiv.org/abs/2308.02129v1
Date: Fri, 4 Aug 2023 04:12:33 GMT
Title: Auditing Yelp's Business Ranking and Review Recommendation Through the Lens of Fairness
Authors: Mohit Singhal, Javier Pacheco, Tanushree Debi, Seyyed Mohammad Sadegh Moosavi Khorzooghi, Abolfazl Asudeh, Gautam Das, Shirin Nilizadeh
Abstract summary: This study investigates Yelp's business ranking and review recommendation system through the lens of fairness. We find that reviews of female and less-established users are disproportionately categorized as recommended. We also find a positive association between restaurants being located in hotspot regions and their average exposure.
Score: 10.957942355264093
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Web 2.0 recommendation systems, such as Yelp, connect users and businesses so that users can identify new businesses and simultaneously express their experiences in the form of reviews. Yelp recommendation software moderates user-provided content by categorizing them into recommended and not-recommended sections. Due to Yelp's substantial popularity and its high impact on local businesses' success, understanding the fairness of its algorithms is crucial. However, with no access to the training data and the algorithms used by such black-box systems, studying their fairness is not trivial, requiring a tremendous effort to minimize bias in data collection and consider the confounding factors in the analysis. This large-scale data-driven study, for the first time, investigates Yelp's business ranking and review recommendation system through the lens of fairness. We define and examine 4 hypotheses to examine if Yelp's recommendation software shows bias and if Yelp's business ranking algorithm shows bias against restaurants located in specific neighborhoods. Our findings show that reviews of female and less-established users are disproportionately categorized as recommended. We also find a positive association between restaurants being located in hotspot regions and their average exposure. Furthermore, we observed some cases of severe disparity bias in cities where the hotspots are in neighborhoods with less demographic diversity or areas with higher affluence and education levels. Indeed, biases introduced by data-driven systems, including our findings in this paper, are (almost) always implicit and through proxy attributes. Still, the authors believe such implicit biases should be detected and resolved as those can create cycles of discrimination that keep increasing the social gaps between different groups even further.

Related papers

Re-evaluating Open-ended Evaluation of Large Language Models [50.23008729038318]
We show that the current Elo-based rating systems can be susceptible to and even reinforce biases in data, intentional or accidental. We propose evaluation as a 3-player game, and introduce novel game-theoretic solution concepts to ensure robustness to redundancy.
arXiv Detail & Related papers (2025-02-27T15:07:47Z)
Reducing Popularity Influence by Addressing Position Bias [0.0]
We show that position debiasing can effectively reduce a skew in the popularity of items induced by the position bias through a feedback loop. We show that position debiasing can significantly improve assortment utilization, without any degradation in user engagement or financial metrics.
arXiv Detail & Related papers (2024-12-11T21:16:37Z)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Metrics for popularity bias in dynamic recommender systems [0.0]
Biased recommendations may lead to decisions that can potentially have adverse effects on individuals, sensitive user groups, and society. This paper focuses on quantifying popularity bias that stems directly from the output of RecSys models. Four metrics to quantify popularity bias in RescSys over time in dynamic setting across different sensitive user groups have been proposed.
arXiv Detail & Related papers (2023-10-12T16:15:30Z)
Whole Page Unbiased Learning to Rank [59.52040055543542]
Unbiased Learning to Rank(ULTR) algorithms are proposed to learn an unbiased ranking model with biased click data. We propose a Bias Agnostic whole-page unbiased Learning to rank algorithm, named BAL, to automatically find the user behavior model. Experimental results on a real-world dataset verify the effectiveness of the BAL.
arXiv Detail & Related papers (2022-10-19T16:53:08Z)
Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR) CPR achieves unbiased recommendation without knowing the exposure mechanism. We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z)
The Unfairness of Active Users and Popularity Bias in Point-of-Interest Recommendation [4.578469978594752]
This paper studies the interplay between (i) the unfairness of active users, (ii) the unfairness of popular items, and (iii) the accuracy of recommendation as three angles of our study triangle. For item fairness, we divide items into short-head, mid-tail, and long-tail groups and study the exposure of these item groups into the top-k recommendation list of users. Our study shows that most recommendation models cannot satisfy both consumer and producer fairness, indicating a trade-off between these variables possibly due to natural biases in data.
arXiv Detail & Related papers (2022-02-27T08:02:19Z)
Reviews in motion: a large scale, longitudinal study of review recommendations on Yelp [24.34131115451651]
We focus on "reclassification," wherein a platform changes its filtering decision for a review. We compile over 12.5M reviews--more than 2M unique--across over 10k businesses. Our data suggests demographic disparities in reclassifications, with more changes in lower density and low-middle income areas.
arXiv Detail & Related papers (2022-02-18T03:27:53Z)
Spatio-Temporal Graph Representation Learning for Fraudster Group Detection [50.779498955162644]
Companies may hire fraudster groups to write fake reviews to either demote competitors or promote their own businesses. To detect such groups, a common model is to represent fraudster groups' static networks. We propose to first capitalize on the effectiveness of the HIN-RNN in both reviewers' representation learning.
arXiv Detail & Related papers (2022-01-07T08:01:38Z)
Unbiased Pairwise Learning to Rank in Recommender Systems [4.058828240864671]
Unbiased learning to rank algorithms are appealing candidates and have already been applied in many applications with single categorical labels. We propose a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion. Experiment results on public benchmark datasets and internal live traffic show the superior results of the proposed method for both categorical and continuous labels.
arXiv Detail & Related papers (2021-11-25T06:04:59Z)
Incentives for Item Duplication under Fair Ranking Policies [69.14168955766847]
We study the behaviour of different fair ranking policies in the presence of duplicates. We find that fairness-aware ranking policies may conflict with diversity, due to their potential to incentivize duplication more than policies solely focused on relevance.
arXiv Detail & Related papers (2021-10-29T11:11:15Z)
"You eat with your eyes first": Optimizing Yelp Image Advertising [0.8594140167290099]
We use Yelp's image dataset and star-based review system as a measurement of an image's effectiveness in promoting a business. We achieve 90-98% accuracy in classifying simplified star ratings for various image categories and observe that images containing blue skies, open surroundings, and many windows are correlated with higher Yelp reviews.
arXiv Detail & Related papers (2020-11-03T02:49:40Z)
DeepFair: Deep Learning for Improving Fairness in Recommender Systems [63.732639864601914]
The lack of bias management in Recommender Systems leads to minority groups receiving unfair recommendations. We propose a Deep Learning based Collaborative Filtering algorithm that provides recommendations with an optimum balance between fairness and accuracy without knowing demographic information about the users.
arXiv Detail & Related papers (2020-06-09T13:39:38Z)
Fairness-Aware Explainable Recommendation over Knowledge Graphs [73.81994676695346]
We analyze different groups of users according to their level of activity, and find that bias exists in recommendation performance between different groups. We show that inactive users may be more susceptible to receiving unsatisfactory recommendations, due to insufficient training data for the inactive users. We propose a fairness constrained approach via re-ranking to mitigate this problem in the context of explainable recommendation over knowledge graphs.
arXiv Detail & Related papers (2020-06-03T05:04:38Z)
Quarantine Deceiving Yelp's Users by Detecting Unreliable Rating Reviews [1.3999481573773074]
We focus on quarantining Yelp's users that employ both review spike detection (RSD) algorithm and spam detection technique in bridging review networks (BRN) We found that more than 80% of Yelp's accounts are unreliable, and more than 80% of highly-rated businesses are subject to spamming.
arXiv Detail & Related papers (2020-04-21T02:44:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.