Related papers: Towards LLM-Based Usability Analysis for Recommender User Interfaces

Towards LLM-Based Usability Analysis for Recommender User Interfaces

URL: http://arxiv.org/abs/2511.14359v1
Date: Tue, 18 Nov 2025 11:05:13 GMT
Title: Towards LLM-Based Usability Analysis for Recommender User Interfaces
Authors: Sebastian Lubos, Alexander Felfernig, Damian Garber, Viet-Man Le, Thi Ngoc Trang Tran,
Abstract summary: We explore the potential of multimodal large language models to assess the usability of recommender system interfaces.<n>We take user interface screenshots from multiple recommender platforms to cover both preference elicitation and recommendation presentation scenarios.
Score: 41.966962052550656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Usability is a key factor in the effectiveness of recommender systems. However, the analysis of user interfaces is a time-consuming process that requires expertise. Recent advances in multimodal large language models (LLMs) offer promising opportunities to automate such evaluations. In this work, we explore the potential of multimodal LLMs to assess the usability of recommender system interfaces by considering a variety of publicly available systems as examples. We take user interface screenshots from multiple of these recommender platforms to cover both preference elicitation and recommendation presentation scenarios. An LLM is instructed to analyze these interfaces with regard to different usability criteria and provide explanatory feedback. Our evaluation demonstrates how LLMs can support heuristic-style usability assessments at scale to support the improvement of user experience.

Related papers

MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces [97.62557395494962]
We use crowdsourcing to benchmark GPT-4o, Claude, and Llama across 30 interfaces.<n>Our results show that MLLMs approximate human preferences on some dimensions but diverge on others.
arXiv Detail & Related papers (2025-10-09T20:00:41Z)
Towards Recommending Usability Improvements with Multimodal Large Language Models [40.77787659104315]
Common evaluation methods, such as usability testing and inspection, are effective but resource-intensive and require expert involvement.<n>Recent advances in multimodal LLMs offer promising opportunities to automate usability evaluation processes.<n>Our findings indicate the potential of LLMs to enable faster and more cost-effective usability evaluation.
arXiv Detail & Related papers (2025-08-22T07:38:37Z)
RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models [40.74293642666989]
We present the idea of RecSys Arena, where the recommendation results given by two different recommender systems are evaluated by an LLM judger to obtain fine-grained evaluation feedback.<n>We demonstrate that many different LLMs provide general evaluation results that are highly consistent with canonical offline metrics.<n>It can better distinguish different algorithms with comparable performance in terms of AUC and nDCG.
arXiv Detail & Related papers (2024-12-15T05:57:36Z)
Leveraging LLMs for Influence Path Planning in Proactive Recommendation [34.5820082133773]
proactive recommender systems aim to guide user interest to gradually like a target item beyond historical interests.<n>IRS designs a sequential model for influence path planning but faces issues of lacking target item inclusion and path coherence.<n>We propose an LLM-based Influence Path Planning (LLM-IPP) method to generate coherent and effective influence paths.
arXiv Detail & Related papers (2024-09-07T13:41:37Z)
LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation [15.972926854420619]
Leveraging large language models (LLMs) offers new opportunities for comprehensive recommendation logic generation. Fine-tuning LLM models for recommendation tasks incurs high computational costs and alignment issues with existing systems. In this work, our proposed effective strategy LANE aligns LLMs with online recommendation systems without additional LLMs tuning.
arXiv Detail & Related papers (2024-07-03T06:20:31Z)
Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate [74.06294042304415]
We propose ScaleEval, an agent-debate-assisted meta-evaluation framework. We release the code for our framework, which is publicly available on GitHub.
arXiv Detail & Related papers (2024-01-30T07:03:32Z)
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis [91.5632751731927]
Large Language Models such as ChatGPT have showcased remarkable abilities in solving general tasks.<n>We propose a general framework for utilizing LLMs in recommendation tasks, focusing on the capabilities of LLMs as recommenders.<n>We analyze the impact of public availability, tuning strategies, model architecture, parameter scale, and context length on recommendation results.
arXiv Detail & Related papers (2024-01-10T08:28:56Z)
Empowering Few-Shot Recommender Systems with Large Language Models -- Enhanced Representations [0.0]
Large language models (LLMs) offer novel insights into tackling the few-shot scenarios encountered by explicit feedback-based recommender systems. Our study can inspire researchers to delve deeper into the multifaceted dimensions of LLMs's involvement in recommender systems.
arXiv Detail & Related papers (2023-12-21T03:50:09Z)
Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model. This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models. Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs) In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol. We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.