Retrieval-augmented Generation to Improve Math Question-Answering:
Trade-offs Between Groundedness and Human Preference
- URL: http://arxiv.org/abs/2310.03184v2
- Date: Sat, 11 Nov 2023 01:26:22 GMT
- Title: Retrieval-augmented Generation to Improve Math Question-Answering:
Trade-offs Between Groundedness and Human Preference
- Authors: Zachary Levonian, Chenglu Li, Wangda Zhu, Anoushka Gade, Owen Henkel,
Millie-Ellen Postle, Wanli Xing
- Abstract summary: We design prompts that retrieve and use content from a high-quality open-source math textbook to generate responses to real student questions.
We evaluate the efficacy of this RAG system for middle-school algebra and geometry QA by administering a multi-condition survey.
We argue that while RAG is able to improve response quality, designers of math QA systems must consider trade-offs between generating responses preferred by students and responses closely matched to specific educational resources.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For middle-school math students, interactive question-answering (QA) with
tutors is an effective way to learn. The flexibility and emergent capabilities
of generative large language models (LLMs) has led to a surge of interest in
automating portions of the tutoring process - including interactive QA to
support conceptual discussion of mathematical concepts. However, LLM responses
to math questions can be incorrect or mismatched to the educational context -
such as being misaligned with a school's curriculum. One potential solution is
retrieval-augmented generation (RAG), which involves incorporating a vetted
external knowledge source in the LLM prompt to increase response quality. In
this paper, we designed prompts that retrieve and use content from a
high-quality open-source math textbook to generate responses to real student
questions. We evaluate the efficacy of this RAG system for middle-school
algebra and geometry QA by administering a multi-condition survey, finding that
humans prefer responses generated using RAG, but not when responses are too
grounded in the textbook content. We argue that while RAG is able to improve
response quality, designers of math QA systems must consider trade-offs between
generating responses preferred by students and responses closely matched to
specific educational resources.
Related papers
- Research on the Application of Large Language Models in Automatic Question Generation: A Case Study of ChatGLM in the Context of High School Information Technology Curriculum [3.0753648264454547]
The model is guided to generate diverse questions, which are then comprehensively evaluated by domain experts.
The results indicate that ChatGLM outperforms human-generated questions in terms of clarity and teachers' willingness to use.
arXiv Detail & Related papers (2024-08-21T11:38:32Z) - Multimodal Reranking for Knowledge-Intensive Visual Question Answering [77.24401833951096]
We introduce a multi-modal reranker to improve the ranking quality of knowledge candidates for answer generation.
Experiments on OK-VQA and A-OKVQA show that multi-modal reranker from distant supervision provides consistent improvements.
arXiv Detail & Related papers (2024-07-17T02:58:52Z) - Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks.
We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM.
We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z) - Automatic question generation for propositional logical equivalences [6.221146613622175]
We develop and implement a method capable of generating tailored questions for each student.
Previous studies have investigated AQG frameworks in education, which include validity, user-defined difficulty, and personalized problem generation.
Our new AQG approach produces logical equivalence problems for Discrete Mathematics, which is a core course for year-one computer science students.
arXiv Detail & Related papers (2024-05-09T02:44:42Z) - How Teachers Can Use Large Language Models and Bloom's Taxonomy to
Create Educational Quizzes [5.487297537295827]
This paper applies a large language model-based QG approach where questions are generated with learning goals derived from Bloom's taxonomy.
The results demonstrate that teachers prefer to write quizzes with automatically generated questions, and that such quizzes have no loss in quality compared to handwritten versions.
arXiv Detail & Related papers (2024-01-11T13:47:13Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Automated Distractor and Feedback Generation for Math Multiple-choice
Questions via In-context Learning [43.83422798569986]
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and reliable form of assessment.
To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers.
We propose a simple, in-context learning-based solution for automated distractor and corresponding feedback message generation.
arXiv Detail & Related papers (2023-08-07T01:03:04Z) - Covering Uncommon Ground: Gap-Focused Question Generation for Answer
Assessment [75.59538732476346]
We focus on the problem of generating such gap-focused questions (GFQs) automatically.
We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these.
arXiv Detail & Related papers (2023-07-06T22:21:42Z) - UKP-SQuARE: An Interactive Tool for Teaching Question Answering [61.93372227117229]
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course.
We introduce UKP-SQuARE as a platform for QA education.
Students can run, compare, and analyze various QA models from different perspectives.
arXiv Detail & Related papers (2023-05-31T11:29:04Z) - Automatic Generation of Socratic Subquestions for Teaching Math Word
Problems [16.97827669744673]
We explore the ability of large language models (LMs) in generating sequential questions for guiding math word problem-solving.
On both automatic and human quality evaluations, we find that LMs constrained with desirable question properties generate superior questions.
Results suggest that the difficulty level of problems plays an important role in determining whether questioning improves or hinders human performance.
arXiv Detail & Related papers (2022-11-23T10:40:22Z) - Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context.
We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator.
We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.