Related papers: Improving Socratic Question Generation using Data Augmentation and Preference Optimization

Improving Socratic Question Generation using Data Augmentation and Preference Optimization

URL: http://arxiv.org/abs/2403.00199v3
Date: Fri, 19 Apr 2024 02:36:26 GMT
Title: Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Authors: Nischal Ashok Kumar, Andrew Lan,
Abstract summary: Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. Existing methods that involve prompting these LLMs sometimes produce invalid outputs. We propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones.
Score: 2.1485350418225244
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.

Related papers

GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation [62.63014905981601]
Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs) by improving their ability to refuse responses to questions beyond their knowledge. Effective RAIT must address two key challenges: firstly, effectively reject unknown questions to minimize hallucinations; secondly, avoid over-refusal to ensure questions that can be correctly answered are not rejected. GraIT employs gradient-driven sample selection to effectively minimize hallucinations and (2) introduces an adaptive weighting mechanism during fine-tuning to reduce the risk of over-refusal.
arXiv Detail & Related papers (2025-02-09T14:11:30Z)
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling [62.19438812624467]
Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning. We propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z)
Can Language Models Employ the Socratic Method? Experiments with Code Debugging [1.2776694801834354]
This paper introduces a dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debug abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer-T5 to zero-shot and chain of thought prompting of the much larger GPT-4.
arXiv Detail & Related papers (2023-10-04T23:32:33Z)
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization. We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z)
A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals. There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance. We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z)
Offline Inverse Reinforcement Learning [24.316047317028147]
offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available. Inspired by the success of IRL techniques in achieving state of the art imitation performances in online settings, we exploit GAN based data augmentation procedures to construct the first offline IRL algorithm.
arXiv Detail & Related papers (2021-06-09T13:44:06Z)
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems. Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.