Sequencing Matters: A Generate-Retrieve-Generate Model for Building
Conversational Agents
- URL: http://arxiv.org/abs/2311.09513v1
- Date: Thu, 16 Nov 2023 02:37:58 GMT
- Title: Sequencing Matters: A Generate-Retrieve-Generate Model for Building
Conversational Agents
- Authors: Quinn Patwardhan, Grace Hui Yang
- Abstract summary: The Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023.
Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate.
Our solution involves the use of Large Language Models (LLMs) for initial answers, answer grounding by BM25, passage quality filtering by logistic regression, and answer generation by LLMs again.
- Score: 9.191944519634111
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper contains what the Georgetown InfoSense group has done in regard to
solving the challenges presented by TREC iKAT 2023. Our submitted runs
outperform the median runs by a significant margin, exhibiting superior
performance in nDCG across various cut numbers and in overall success rate. Our
approach uses a Generate-Retrieve-Generate method, which we've found to greatly
outpace Retrieve-Then-Generate approaches for the purposes of iKAT. Our
solution involves the use of Large Language Models (LLMs) for initial answers,
answer grounding by BM25, passage quality filtering by logistic regression, and
answer generation by LLMs again. We leverage several purpose-built Language
Models, including BERT, Chat-based, and text-to-transfer-based models, for text
understanding, classification, generation, and summarization. The official
results of the TREC evaluation contradict our initial self-evaluation, which
may suggest that a decrease in the reliance on our retrieval and classification
methods is better. Nonetheless, our findings suggest that the sequence of
involving these different components matters, where we see an essentiality of
using LLMs before using search engines.
Related papers
- An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token.
We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains.
Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z) - Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
RAG has been widely adopted to enhance Large Language Models (LLMs)
Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG.
This paper proposes a fine-grained ATG method called ReClaim(Refer & Claim), which alternates the generation of references and answers step by step.
arXiv Detail & Related papers (2024-07-01T20:47:47Z) - A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation [14.064465097974836]
This paper proposes a novel approach to evaluate Counter Narrative (CN) generation using a Large Language Model (LLM) as an evaluator.
We show that traditional automatic metrics correlate poorly with human judgements and fail to capture the nuanced relationship between generated CNs and human perception.
arXiv Detail & Related papers (2024-06-21T15:11:33Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - Benchmarking Large Language Models in Retrieval-Augmented Generation [53.504471079548]
We systematically investigate the impact of Retrieval-Augmented Generation on large language models.
We analyze the performance of different large language models in 4 fundamental abilities required for RAG.
We establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese.
arXiv Detail & Related papers (2023-09-04T08:28:44Z) - Learning to Rank in Generative Retrieval [62.91492903161522]
Generative retrieval aims to generate identifier strings of relevant passages as the retrieval target.
We propose a learning-to-rank framework for generative retrieval, dubbed LTRGR.
This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems.
arXiv Detail & Related papers (2023-06-27T05:48:14Z) - Making a (Counterfactual) Difference One Rationale at a Time [5.97507595130844]
We investigate whether counterfactual data augmentation, without human assistance, can improve the performance of the selector.
Our results show that CDA produces rationales that better capture the signal of interest.
arXiv Detail & Related papers (2022-01-13T19:05:02Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.