Related papers: Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction

Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction

URL: http://arxiv.org/abs/2112.03849v1
Date: Tue, 7 Dec 2021 17:39:21 GMT
Title: Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction
Authors: Manas Jain, Sriparna Saha, Pushpak Bhattacharyya, Gladvin Chinnadurai, Manish Kumar Vatsa
Abstract summary: This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer as the input. A transformer-based Grammar Error Correction model GECToR ( 2020), is used as a post-processing step for better fluency. We compare our system with (i) Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions.
Score: 39.40116590327074
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Question Answering systems these days typically use template-based language generation. Though adequate for a domain-specific task, these systems are too restrictive and predefined for domain-independent systems. This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer (short spans such as named entities) as the input. Our system uses constituency and dependency parse trees of questions. A transformer-based Grammar Error Correction model GECToR (2020), is used as a post-processing step for better fluency. We compare our system with (i) Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions. We also test our approach on existential (yes-no) questions with better results. Our model generates accurate and fluent answers than the state-of-the-art (SOTA) approaches. The evaluation is done on NewsQA and SqUAD datasets with an increment of 0.4 and 0.9 percentage points in ROUGE-1 score respectively. Also the inference time is reduced by 85\% as compared to the SOTA. The improved datasets used for our evaluation will be released as part of the research contribution.

Related papers

RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries. This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
Automatic Speech Recognition System-Independent Word Error Rate Estimation [23.25173244408922]
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In this paper, a hypothesis generation method for ASR System-Independent WER estimation is proposed.
arXiv Detail & Related papers (2024-04-25T16:57:05Z)
Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference [72.61732440246954]
Large pre-trained language models often lack logical consistency across test inputs. We propose a framework, ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models. We show that ConCoRD consistently boosts accuracy and consistency of off-the-shelf closed-book QA and VQA models.
arXiv Detail & Related papers (2022-11-21T21:58:30Z)
VANiLLa : Verbalized Answers in Natural Language at Large Scale [2.9098477555578333]
This dataset consists of over 100k simple questions adapted from the CSQA and SimpleQuestionsWikidata datasets. The answer sentences in this dataset are syntactically and semantically closer to the question than to the triple fact.
arXiv Detail & Related papers (2021-05-24T16:57:54Z)
Stacking Neural Network Models for Automatic Short Answer Scoring [0.0]
We propose the use of a stacking model based on neural network and XGBoost for classification process with sentence embedding feature. Best model obtained an F1-score of 0.821 exceeding the previous work at the same dataset.
arXiv Detail & Related papers (2020-10-21T16:00:09Z)
Unsupervised Evaluation for Question Answering with Transformers [46.16837670041594]
We investigate the hidden representations of questions, answers, and contexts in transformer-based QA architectures. We observe a consistent pattern in the answer representations, which we show can be used to automatically evaluate whether or not a predicted answer is correct. We are able to predict whether or not a model's answer is correct with 91.37% accuracy SQuAD, and 80.7% accuracy on SubjQA.
arXiv Detail & Related papers (2020-10-07T07:03:30Z)
Sequence-to-Sequence Learning for Indonesian Automatic Question Generator [0.0]
We construct an Indonesian automatic question generator, adapting the architecture from some previous works. The system achieved BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L score at 38,35, 20,96, 10,68, 5,78, and 43,4 for SQuAD, and 39.9, 20.78, 10.26, 6.31, 44.13 for TyDiQA.
arXiv Detail & Related papers (2020-09-29T09:25:54Z)
The Paradigm Discovery Problem [121.79963594279893]
We formalize the paradigm discovery problem and develop metrics for judging systems. We report empirical results on five diverse languages. Our code and data are available for public use.
arXiv Detail & Related papers (2020-05-04T16:38:54Z)
KPQA: A Metric for Generative Question Answering Using Keyphrase Weights [64.54593491919248]
KPQA-metric is a new metric for evaluating correctness of generative question answering systems. Our new metric assigns different weights to each token via keyphrase prediction. We show that our proposed metric has a significantly higher correlation with human judgments than existing metrics.
arXiv Detail & Related papers (2020-05-01T03:24:36Z)
AMR Parsing via Graph-Sequence Iterative Inference [62.85003739964878]
We propose a new end-to-end model that treats AMR parsing as a series of dual decisions on the input sequence and the incrementally constructed graph. We show that the answers to these two questions are mutually causalities. We design a model based on iterative inference that helps achieve better answers in both perspectives, leading to greatly improved parsing accuracy.
arXiv Detail & Related papers (2020-04-12T09:15:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.