Related papers: Measuring the Quality of Answers in Political Q&As with Large Language Models

Measuring the Quality of Answers in Political Q&As with Large Language Models

URL: http://arxiv.org/abs/2404.08816v5
Date: Thu, 20 Feb 2025 05:30:53 GMT
Title: Measuring the Quality of Answers in Political Q&As with Large Language Models
Authors: R. Michael Alvarez, Jacob Morrier,
Abstract summary: This article proposes a new approach for assessing the quality of answers in political question-and-answer sessions.<n>We measure the quality of an answer based on how easily and accurately it can be recognized in a random set of candidate answers given the question's text.
Score: 0.5261718469769449
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This article proposes a new approach for assessing the quality of answers in political question-and-answer sessions. We measure the quality of an answer based on how easily and accurately it can be recognized in a random set of candidate answers given the question's text. This measure reflects the answer's relevance and depth of engagement with the question. Like semantic search, we can implement this approach by training a language model on the corpus of observed questions and answers without additional human-labeled data. We showcase and validate our methodology within the context of the Question Period in the Canadian House of Commons. Our analysis reveals that while some answers have a weak semantic connection to questions, hinting at some evasion or obfuscation, they are generally at least moderately relevant, far exceeding what we would expect from random replies. We also find a meaningful correlation between answer quality and the party affiliation of the members of Parliament asking the questions.

Related papers

"I Never Said That": A dataset, taxonomy and baselines on response clarity classification [4.16330182801919]
We introduce a novel taxonomy that frames the task of detecting and classifying response clarity. Our proposed two-level taxonomy addresses the clarity of a response in terms of the information provided for a given question. We combine ChatGPT and human annotators to collect, validate and annotate discrete QA pairs from political interviews.
arXiv Detail & Related papers (2024-09-20T20:15:06Z)
How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading [60.19226384241482]
We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles. We explore various approaches to generate such questions using language models. We conduct a human study to understand the implication of such questions on reading comprehension.
arXiv Detail & Related papers (2024-07-19T13:42:56Z)
CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval [52.134133938779776]
We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate. Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn.
arXiv Detail & Related papers (2024-04-28T18:21:31Z)
Controllable Decontextualization of Yes/No Question and Answers into Factual Statements [28.02936811004903]
We address the problem of controllable rewriting of answers to polar questions into decontextualized and succinct factual statements. We propose a Transformer sequence to sequence model that utilizes soft-constraints to ensure controllable rewriting.
arXiv Detail & Related papers (2024-01-18T07:52:12Z)
ExpertQA: Expert-Curated Questions and Attributed Answers [51.68314045809179]
We conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality. We collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions. The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.
arXiv Detail & Related papers (2023-09-14T16:54:34Z)
Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z)
Selectively Answering Ambiguous Questions [38.83930394700588]
We find that the most reliable approach to decide when to abstain involves quantifying repetition within sampled model outputs. Our results suggest that sampling-based confidence scores help calibrate answers to relatively unambiguous questions.
arXiv Detail & Related papers (2023-05-24T01:25:38Z)
Conversational QA Dataset Generation with Answer Revision [2.5838973036257458]
We introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations. Our framework revises the extracted answers after generating questions so that answers exactly match paired questions.
arXiv Detail & Related papers (2022-09-23T04:05:38Z)
Double Retrieval and Ranking for Accurate Question Answering [120.69820139008138]
We show that an answer verification step introduced in Transformer-based answer selection models can significantly improve the state of the art in Question Answering. The results on three well-known datasets for AS2 show consistent and significant improvement of the state of the art.
arXiv Detail & Related papers (2022-01-16T06:20:07Z)
Improving the Question Answering Quality using Answer Candidate Filtering based on Natural-Language Features [117.44028458220427]
We address the problem of how the Question Answering (QA) quality of a given system can be improved. Our main contribution is an approach capable of identifying wrong answers provided by a QA system. In particular, our approach has shown its potential while removing in many cases the majority of incorrect answers.
arXiv Detail & Related papers (2021-12-10T11:09:44Z)
Discourse Comprehension: A Question Answering Framework to Represent Sentence Connections [35.005593397252746]
A key challenge in building and evaluating models for discourse comprehension is the lack of annotated data. This paper presents a novel paradigm that enables scalable data collection targeting the comprehension of news documents. The resulting corpus, DCQA, consists of 22,430 question-answer pairs across 607 English documents.
arXiv Detail & Related papers (2021-11-01T04:50:26Z)
Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions [65.60888490988236]
We release a dataset focused on open-domain single- and multi-turn conversations. We benchmark several state-of-the-art neural baselines. We propose a pipeline consisting of offline and online steps for evaluating the quality of clarifying questions in various dialogues.
arXiv Detail & Related papers (2021-09-13T09:16:14Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
Stay Hungry, Stay Focused: Generating Informative and Specific Questions in Information-Seeking Conversations [41.74162467619795]
We investigate the problem of generating informative questions in information-asymmetric conversations. To generate pragmatic questions, we use reinforcement learning to optimize an informativeness metric. We demonstrate that the resulting pragmatic questioner substantially improves the informativeness and specificity of questions generated over a baseline model.
arXiv Detail & Related papers (2020-04-30T00:49:14Z)
SubjQA: A Dataset for Subjectivity and Review Comprehension [52.13338191442912]
We investigate the relationship between subjectivity and question answering (QA) We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance. We release an English QA dataset (SubjQA) based on customer reviews, containing subjectivity annotations for questions and answer spans across 6 distinct domains.
arXiv Detail & Related papers (2020-04-29T15:59:30Z)
Review-guided Helpful Answer Identification in E-commerce [38.276241153439955]
Product-specific community question answering platforms can greatly help address the concerns of potential customers. The user-provided answers on such platforms often vary a lot in their qualities. Helpfulness votes from the community can indicate the overall quality of the answer, but they are often missing.
arXiv Detail & Related papers (2020-03-13T11:34:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.