Related papers: Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

URL: http://arxiv.org/abs/2408.00878v1
Date: Thu, 01 Aug 2024 19:04:10 GMT
Title: Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion
Authors: Anton Korikov, George Saad, Ethan Baron, Mustafa Khan, Manav Shah, Scott Sanner,
Abstract summary: We propose several novel aspect fusion strategies to address natural language product queries. For imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 to 0.52, while achieving equivalent performance for balanced review corpora.
Score: 15.630734768499826
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While user-generated product reviews often contain large quantities of information, their utility in addressing natural language product queries has been limited, with a key challenge being the need to aggregate information from multiple low-level sources (reviews) to a higher item level during retrieval. Existing methods for reviewed-item retrieval (RIR) typically take a late fusion (LF) approach which computes query-item scores by simply averaging the top-K query-review similarity scores for an item. However, we demonstrate that for multi-aspect queries and multi-aspect items, LF is highly sensitive to the distribution of aspects covered by reviews in terms of aspect frequency and the degree of aspect separation across reviews. To address these LF failures, we propose several novel aspect fusion (AF) strategies which include Large Language Model (LLM) query extraction and generative reranking. Our experiments show that for imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 to 0.52, while achieving equivalent performance for balanced review corpora.

Related papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering [78.89231943329885]
One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA) In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons.
arXiv Detail & Related papers (2025-03-19T08:45:03Z)
MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
Data Fusion of Synthetic Query Variants With Generative Large Language Models [1.864807003137943]
This work explores the feasibility of using synthetic query variants generated by instruction-tuned Large Language Models in data fusion experiments. We introduce a lightweight, unsupervised, and cost-efficient approach that exploits principled prompting and data fusion techniques. Our analysis shows that data fusion based on synthetic query variants is significantly better than baselines with single queries and also outperforms pseudo-relevance feedback methods.
arXiv Detail & Related papers (2024-11-06T12:54:27Z)
LFOSum: Summarizing Long-form Opinions with Large Language Models [7.839083566878183]
This paper introduces (1) a new dataset of long-form user reviews, each entity comprising over a thousand reviews, (2) two training-free LLM-based summarization approaches that scale to long inputs, and (3) automatic evaluation metrics. Our dataset of user reviews is paired with in-depth and unbiased critical summaries by domain experts, serving as a reference for evaluation. Our evaluation reveals that LLMs still face challenges in balancing sentiment and format adherence in long-form summaries, though open-source models can narrow the gap when relevant information is retrieved in a focused manner.
arXiv Detail & Related papers (2024-10-16T20:52:39Z)
LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs [61.57691505683534]
Non-Factoid (NF) Question Answering (QA) is challenging to evaluate due to diverse potential answers and no objective criterion. Large Language Models (LLMs) have been resorted to for NFQA evaluation due to their compelling performance on various NLP tasks. We propose a novel listwise NFQA evaluation approach, that utilizes LLMs to rank candidate answers in a list of reference answers sorted by descending quality.
arXiv Detail & Related papers (2024-09-23T06:42:21Z)
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses. We introduce CaLM, a novel verification framework. Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z)
Large Language Models for Relevance Judgment in Product Search [48.56992980315751]
High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search. We present an array of techniques for leveraging Large Language Models (LLMs) for automating the relevance judgment of query-item pairs (QIPs) at scale. Our findings have immediate implications for the growing field of relevance judgment automation in product search.
arXiv Detail & Related papers (2024-06-01T00:52:41Z)
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z)
Revisiting Large Language Models as Zero-shot Relation Extractors [8.953462875381888]
Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting. Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt. This work focuses on the study of exploring LLMs as zero-shot relation extractors.
arXiv Detail & Related papers (2023-10-08T06:17:39Z)
Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval [12.850360384298712]
We extend Neural Information Retrieval (IR) methods for matching queries to documents to the task of reviewing items. We use self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews. For contrastive learning in a Late Fusion scenario, we investigate the use of positive review samples from the same item and/or with the same rating. For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings.
arXiv Detail & Related papers (2023-08-01T18:01:21Z)
LoL: A Comparative Regularization Loss over Query Reformulation Losses for Pseudo-Relevance Feedback [70.44530794897861]
Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy. Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents. We propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training.
arXiv Detail & Related papers (2022-04-25T10:42:50Z)
Query Focused Multi-Document Summarization with Distant Supervision [88.39032981994535]
Existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments. We propose a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query. We demonstrate that our framework outperforms strong comparison systems on standard QFS benchmarks.
arXiv Detail & Related papers (2020-04-06T22:35:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.