WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation
- URL: http://arxiv.org/abs/2412.01626v2
- Date: Sun, 02 Feb 2025 16:34:25 GMT
- Title: WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation
- Authors: Jamshid Mozafari, Florian Gerhold, Adam Jatowt,
- Abstract summary: We first introduce a manually constructed hint dataset, WikiHint, which is based on Wikipedia and includes 5,000 hints created for 1,000 questions.
We then finetune open-source LLMs such as LLaMA-3.1 for hint generation in answer-aware and answeragnostic contexts.
We assess the effectiveness of the hints with human participants who answer questions with and without the aid of hints.
- Score: 15.144785147549713
- License:
- Abstract: The use of Large Language Models (LLMs) has increased significantly with users frequently asking questions to chatbots. In the time when information is readily accessible, it is crucial to stimulate and preserve human cognitive abilities and maintain strong reasoning skills. This paper addresses such challenges by promoting the use of hints as an alternative or a supplement to direct answers. We first introduce a manually constructed hint dataset, WikiHint, which is based on Wikipedia and includes 5,000 hints created for 1,000 questions. We then finetune open-source LLMs such as LLaMA-3.1 for hint generation in answer-aware and answeragnostic contexts. We assess the effectiveness of the hints with human participants who answer questions with and without the aid of hints. Additionally, we introduce a lightweight evaluation method, HintRank, to evaluate and rank hints in both answeraware and answer-agnostic settings. Our findings show that (a) the dataset helps generate more effective hints, (b) including answer information along with questions generally improves quality of generated hints, and (c) encoder-based models perform better than decoder-based models in hint ranking.
Related papers
- HintEval: A Comprehensive Framework for Hint Generation and Evaluation for Questions [16.434748534272014]
HintEval is a Python library that makes it easy to access diverse datasets and provides multiple approaches to generate and evaluate hints.
By reducing barriers to entry, HintEval offers a major step forward for facilitating hint generation and analysis research within the NLP/IR community.
arXiv Detail & Related papers (2025-02-02T17:07:18Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions [20.510164413931577]
We introduce a framework for the automatic hint generation for factoid questions.
We construct a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset.
To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided hints.
arXiv Detail & Related papers (2024-03-27T10:27:28Z) - KIWI: A Dataset of Knowledge-Intensive Writing Instructions for
Answering Research Questions [63.307317584926146]
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents.
In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer.
We construct KIWI, a dataset of knowledge-intensive writing instructions in the scientific domain.
arXiv Detail & Related papers (2024-03-06T17:16:44Z) - Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions.
We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z) - Large Language Models Meet Knowledge Graphs to Answer Factoid Questions [57.47634017738877]
We propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs.
We procure easily interpreted information with Transformer-based models through the linearization of the extracted subgraphs.
Final re-ranking of the answer candidates with the extracted information boosts Hits@1 scores of the pre-trained text-to-text language models by 4-6%.
arXiv Detail & Related papers (2023-10-03T15:57:00Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation [38.78216651059955]
We introduce the task of real-world information-seeking follow-up question generation (FQG)
We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question)s collected from a forum layman providing Reddit-friendly explanations for open-ended questions.
In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills.
arXiv Detail & Related papers (2023-09-10T11:58:29Z) - Question Answering Survey: Directions, Challenges, Datasets, Evaluation
Matrices [0.0]
The research directions of QA field are analyzed based on the type of question, answer type, source of evidence-answer, and modeling approach.
This detailed followed by open challenges of the field like automatic question generation, similarity detection and, low resource availability for a language.
arXiv Detail & Related papers (2021-12-07T08:53:40Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.