Related papers: Large Language Models for Relevance Judgment in Product Search

Large Language Models for Relevance Judgment in Product Search

URL: http://arxiv.org/abs/2406.00247v2
Date: Tue, 16 Jul 2024 18:01:55 GMT
Title: Large Language Models for Relevance Judgment in Product Search
Authors: Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao,
Abstract summary: High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search. We present an array of techniques for leveraging Large Language Models (LLMs) for automating the relevance judgment of query-item pairs (QIPs) at scale. Our findings have immediate implications for the growing field of relevance judgment automation in product search.
Score: 48.56992980315751
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for leveraging Large Language Models (LLMs) for automating the relevance judgment of query-item pairs (QIPs) at scale. Using a unique dataset of multi-million QIPs, annotated by human evaluators, we test and optimize hyper parameters for finetuning billion-parameter LLMs with and without Low Rank Adaption (LoRA), as well as various modes of item attribute concatenation and prompting in LLM finetuning, and consider trade offs in item attribute inclusion for quality of relevance predictions. We demonstrate considerable improvement over baselines of prior generations of LLMs, as well as off-the-shelf models, towards relevance annotations on par with the human relevance evaluators. Our findings have immediate implications for the growing field of relevance judgment automation in product search.

Related papers

Generative Product Recommendations for Implicit Superlative Queries [21.750990820244983]
In Recommender Systems, users often seek the best products through indirect, vague, or under-specified queries, such as "best shoes for trail running" We investigate how Large Language Models can generate implicit attributes for ranking as well as reason over them to improve product recommendations for such queries.
arXiv Detail & Related papers (2025-04-26T00:05:47Z)
LLMs as Data Annotators: How Close Are We to Human Performance [47.61698665650761]
Manual annotation of data is labor-intensive, time-consuming, and costly. In-context learning (ICL) in which some examples related to the task are given in the prompt can lead to inefficiencies and suboptimal model performance. This paper presents experiments comparing several LLMs, considering different embedding models, across various datasets for the Named Entity Recognition (NER) task.
arXiv Detail & Related papers (2025-04-21T11:11:07Z)
Automated Query-Product Relevance Labeling using Large Language Models for E-commerce Search [3.392843594990172]
Traditional approaches for annotating query-product pairs rely on human-based labeling services. We show that Large Language Models (LLMs) can approach human-level accuracy on this task in a fraction of the time and cost required by human-labelers. This scalable alternative to human-annotation has significant implications for information retrieval domains.
arXiv Detail & Related papers (2025-02-21T22:59:36Z)
Multimodal semantic retrieval for product search [6.185573921868495]
We build a multimodal representation for product items in e-commerce search in contrast to pure-text representation of products. We demonstrate that a multimodal representation scheme for a product can show improvement on purchase recall or relevance accuracy in semantic retrieval.
arXiv Detail & Related papers (2025-01-13T14:34:26Z)
Re-ranking the Context for Multimodal Retrieval Augmented Generation [28.63893944806149]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context. RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output. We show that by using a more advanced relevancy measure, one can enhance the retrieval process by selecting more relevant pieces from the knowledge-base and eliminate
arXiv Detail & Related papers (2025-01-08T18:58:22Z)
Generative Retrieval with Preference Optimization for E-commerce Search [16.78829577915103]
We develop an innovative framework for E-commerce search, called generative retrieval with preference optimization. We employ multi-span identifiers to represent raw item titles and transform the task of generating titles from queries into the task of generating multi-span identifiers from queries. Our experiments show that this framework achieves competitive performance on a real-world dataset, and online A/B tests demonstrate the superiority and effectiveness in improving conversion gains.
arXiv Detail & Related papers (2024-07-29T09:31:19Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning [16.287067991245962]
In real-world systems, an important consideration for a new model is novelty of its top-k recommendations. We propose a reinforcement learning (RL) formulation where large language models provide feedback for the novel items. We evaluate the proposed algorithm on improving novelty for a query-ad recommendation task on a large-scale search engine.
arXiv Detail & Related papers (2024-06-20T10:20:02Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
Enhanced E-Commerce Attribute Extraction: Innovating with Decorative Relation Correction and LLAMA 2.0-Based Annotation [4.81846973621209]
We propose a pioneering framework that integrates BERT for classification, a Conditional Random Fields (CRFs) layer for attribute value extraction, and Large Language Models (LLMs) for data annotation. Our approach capitalizes on the robust representation learning of BERT, synergized with the sequence decoding prowess of CRFs, to adeptly identify and extract attribute values. Our methodology is rigorously validated on various datasets, including Walmart, BestBuy's e-commerce NER dataset, and the CoNLL dataset.
arXiv Detail & Related papers (2023-12-09T08:26:30Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
Multi-Label Learning to Rank through Multi-Objective Optimization [9.099663022952496]
Learning to Rank technique is ubiquitous in the Information Retrieval system nowadays. To resolve ambiguity, it is desirable to train a model using many relevance criteria. We propose a general framework where the information from labels can be combined in a variety of ways to characterize the trade-off among the goals.
arXiv Detail & Related papers (2022-07-07T03:02:11Z)
Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search [26.772851310517954]
This paper introduces the "Shopping Queries dataset", a large dataset of difficult Amazon search queries and results. The dataset contains around 130 thousand unique queries and 2.6 million manually labeled (product) relevance judgements. The dataset is being used in one of the KDDCup'22 challenges.
arXiv Detail & Related papers (2022-06-14T04:25:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.