Related papers: Structured List-Grounded Question Answering

Structured List-Grounded Question Answering

URL: http://arxiv.org/abs/2410.03950v1
Date: Fri, 4 Oct 2024 22:21:43 GMT
Title: Structured List-Grounded Question Answering
Authors: Mujeen Sung, Song Feng, James Gung, Raphael Shu, Yi Zhang, Saab Mansour,
Abstract summary: Document-grounded dialogue systems aim to answer user queries by leveraging external information. Previous studies have mainly focused on handling free-form documents, often overlooking structured data such as lists. This paper aims to enhance question answering systems for better interpretation and use of structured lists.
Score: 11.109829342410265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document-grounded dialogue systems aim to answer user queries by leveraging external information. Previous studies have mainly focused on handling free-form documents, often overlooking structured data such as lists, which can represent a range of nuanced semantic relations. Motivated by the observation that even advanced language models like GPT-3.5 often miss semantic cues from lists, this paper aims to enhance question answering (QA) systems for better interpretation and use of structured lists. To this end, we introduce the LIST2QA dataset, a novel benchmark to evaluate the ability of QA systems to respond effectively using list information. This dataset is created from unlabeled customer service documents using language models and model-based filtering processes to enhance data quality, and can be used to fine-tune and evaluate QA models. Apart from directly generating responses through fine-tuned models, we further explore the explicit use of Intermediate Steps for Lists (ISL), aligning list items with user backgrounds to better reflect how humans interpret list items before generating responses. Our experimental results demonstrate that models trained on LIST2QA with our ISL approach outperform baselines across various metrics. Specifically, our fine-tuned Flan-T5-XL model shows increases of 3.1% in ROUGE-L, 4.6% in correctness, 4.5% in faithfulness, and 20.6% in completeness compared to models without applying filtering and the proposed ISL method.

Related papers

PPT: A Process-based Preference Learning Framework for Self Improving Table Question Answering Models [16.790216473975146]
We propose Process-based Preference learning framework for table question answering.<n>It decomposes reasoning chains into discrete states, assigns scores to each state, and samples contrastive steps for preference learning.
arXiv Detail & Related papers (2025-05-23T07:24:53Z)
A RAG-Based Institutional Assistant [0.1499944454332829]
We design and evaluate a RAG-based virtual assistant specifically tailored for the University of Sao Paulo. Our optimal retriever model achieves a Top-5 accuracy of 30%, while our most effective generative model scores 22.04% against ground truth answers.
arXiv Detail & Related papers (2025-01-23T17:54:19Z)
RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance [28.63893944806149]
Retrieval-augmented generation (RAG) improves large language models by using external knowledge to guide response generation, reducing hallucinations. RAG can introduce new hallucination sources: (i) the retrieval process may select irrelevant pieces as raw context from the database, and (ii) retrieved images are processed into text-based context. We propose a novel framework to evaluate the reliability of multi-modal RAG using two performance measures: (i) the relevancy score (RS), assessing the relevance of retrieved entries to the query, and (ii) the correctness score (CS), evaluating the accuracy of the generated response
arXiv Detail & Related papers (2025-01-07T18:52:05Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA. Every question requires the integration of information from both the table and the sub-graph to be answered. We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
Improving Attributed Text Generation of Large Language Models via Preference Learning [28.09715554543885]
We model the attribution task as preference learning and introduce an Automatic Preference Optimization framework. APO achieves state-of-the-art citation F1 with higher answer quality.
arXiv Detail & Related papers (2024-03-27T09:19:13Z)
NewsQs: Multi-Source Question Generation for the Inquiring Mind [59.79288644158271]
We present NewsQs, a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles.
arXiv Detail & Related papers (2024-02-28T16:59:35Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents. To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models. Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.