Related papers: TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions

TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions

URL: http://arxiv.org/abs/2403.18426v2
Date: Fri, 10 May 2024 17:10:47 GMT
Title: TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions
Authors: Jamshid Mozafari, Anubhav Jangra, Adam Jatowt,
Abstract summary: We introduce a framework for the automatic hint generation for factoid questions. We construct a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided hints.
Score: 20.510164413931577
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Nowadays, individuals tend to engage in dialogues with Large Language Models, seeking answers to their questions. In times when such answers are readily accessible to anyone, the stimulation and preservation of human's cognitive abilities, as well as the assurance of maintaining good reasoning skills by humans becomes crucial. This study addresses such needs by proposing hints (instead of final answers or before giving answers) as a viable solution. We introduce a framework for the automatic hint generation for factoid questions, employing it to construct TriviaHG, a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. Additionally, we present an automatic evaluation method that measures the Convergence and Familiarity quality attributes of hints. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided hints. The effectiveness of hints varied, with success rates of 96%, 78%, and 36% for questions with easy, medium, and hard answers, respectively. Moreover, the proposed automatic evaluation methods showed a robust correlation with annotators' results. Conclusively, the findings highlight three key insights: the facilitative role of hints in resolving unknown questions, the dependence of hint quality on answer difficulty, and the feasibility of employing automatic evaluation methods for hint assessment.

Related papers

FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction [25.00896070082754]
Extractive reading comprehension systems are designed to locate the correct answer to a question within a given text. A persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably recognizing unanswerable queries. We propose an innovative data augmentation methodology grounded in a multi-agent collaborative framework.
arXiv Detail & Related papers (2025-04-08T01:45:16Z)
HintEval: A Comprehensive Framework for Hint Generation and Evaluation for Questions [16.434748534272014]
HintEval is a Python library that makes it easy to access diverse datasets and provides multiple approaches to generate and evaluate hints. By reducing barriers to entry, HintEval offers a major step forward for facilitating hint generation and analysis research within the NLP/IR community.
arXiv Detail & Related papers (2025-02-02T17:07:18Z)
WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation [15.144785147549713]
We first introduce a manually constructed hint dataset, WikiHint, which is based on Wikipedia and includes 5,000 hints created for 1,000 questions. We assess the effectiveness of the hints with human participants who answer questions with and without the aid of hints. Our findings show that (a) the dataset helps generate more effective hints, (b) including answer information along with questions generally improves the quality of generated hints, and (c) encoder-based models perform better than decoder-based models in hint ranking.
arXiv Detail & Related papers (2024-12-02T15:44:19Z)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance. We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z)
Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires. We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents. Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z)
ExpertQA: Expert-Curated Questions and Attributed Answers [51.68314045809179]
We conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality. We collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions. The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.
arXiv Detail & Related papers (2023-09-14T16:54:34Z)
Obj2Sub: Unsupervised Conversion of Objective to Subjective Questions [1.4680035572775534]
We propose a novel hybrid unsupervised approach leveraging rule-based methods and pre-trained dense retrievers for the novel task of automatically converting the objective questions to subjective questions. We observe that our approach outperforms the existing data-driven approaches by 36.45% as measured by Recall@k and Precision@k.
arXiv Detail & Related papers (2022-05-25T11:46:46Z)
What should I Ask: A Knowledge-driven Approach for Follow-up Questions Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys. We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge. We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z)
Reinforcement Learning for Abstractive Question Summarization with Question-aware Semantic Rewards [20.342580435464072]
We introduce a reinforcement learning-based framework for abstractive question summarization. We propose two novel rewards obtained from the downstream tasks of (i) question-type identification and (ii) question-focus recognition. These rewards ensure the generation of semantically valid questions and encourage the inclusion of key medical entities/foci in the question summary.
arXiv Detail & Related papers (2021-07-01T02:06:46Z)
Predicting respondent difficulty in web surveys: A machine-learning approach based on mouse movement features [3.6944296923226316]
This paper explores the predictive value of mouse-tracking data with regard to respondents' difficulty. We use data from a survey on respondents' employment history and demographic information. We develop a personalization method that adjusts for respondents' baseline mouse behavior and evaluate its performance.
arXiv Detail & Related papers (2020-11-05T10:54:33Z)
Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document. We show that readers engage in a series of pragmatic strategies to seek information. We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
Stay Hungry, Stay Focused: Generating Informative and Specific Questions in Information-Seeking Conversations [41.74162467619795]
We investigate the problem of generating informative questions in information-asymmetric conversations. To generate pragmatic questions, we use reinforcement learning to optimize an informativeness metric. We demonstrate that the resulting pragmatic questioner substantially improves the informativeness and specificity of questions generated over a baseline model.
arXiv Detail & Related papers (2020-04-30T00:49:14Z)
A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset. We measure consensus between answers generated by the model and a set of relevant answers. We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)
Review-guided Helpful Answer Identification in E-commerce [38.276241153439955]
Product-specific community question answering platforms can greatly help address the concerns of potential customers. The user-provided answers on such platforms often vary a lot in their qualities. Helpfulness votes from the community can indicate the overall quality of the answer, but they are often missing.
arXiv Detail & Related papers (2020-03-13T11:34:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.