Code Soliloquies for Accurate Calculations in Large Language Models
- URL: http://arxiv.org/abs/2309.12161v2
- Date: Tue, 31 Oct 2023 20:27:53 GMT
- Title: Code Soliloquies for Accurate Calculations in Large Language Models
- Authors: Shashank Sonkar, MyCo Le, Xinghe Chen, Naiming Liu, Debshila Basu
Mallick, Richard G. Baraniuk
- Abstract summary: High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems.
These datasets are generated using advanced GPT-4 models.
Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4.
Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive.
- Score: 22.1024285108075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-quality conversational datasets are crucial for the successful
development of Intelligent Tutoring Systems (ITS) that utilize a Large Language
Model (LLM) backend. Synthetic student-teacher dialogues, generated using
advanced GPT-4 models, are a common strategy for creating these datasets.
However, subjects like physics that entail complex calculations pose a
challenge. While GPT-4 presents impressive language processing capabilities,
its limitations in fundamental mathematical reasoning curtail its efficacy for
such subjects. To tackle this limitation, we introduce in this paper an
innovative stateful prompt design. Our design orchestrates a mock conversation
where both student and tutorbot roles are simulated by GPT-4. Each student
response triggers an internal monologue, or `code soliloquy' in the
GPT-tutorbot, which assesses whether its subsequent response would necessitate
calculations. If a calculation is deemed necessary, it scripts the relevant
Python code and uses the Python output to construct a response to the student.
Our approach notably enhances the quality of synthetic conversation datasets,
especially for subjects that are calculation-intensive. Our preliminary Subject
Matter Expert evaluations reveal that our Higgs model, a fine-tuned LLaMA
model, effectively uses Python for computations, which significantly enhances
the accuracy and computational reliability of Higgs' responses. Code, models,
and datasets is available at https://github.com/luffycodes/Tutorbot-Spock-Phys.
Related papers
- MIND: Math Informed syNthetic Dialogues for Pretraining LLMs [34.498175178707065]
We propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method.
MIND generates synthetic conversations based on OpenWebMath (OWM), resulting in a new math corpus, MIND-OWM.
Our experiments with different conversational settings reveal that incorporating knowledge gaps between dialog participants is essential for generating high-quality math data.
arXiv Detail & Related papers (2024-10-15T18:25:53Z) - JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models [110.45794710162241]
Existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs to synthesize massive math problems.
We propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data.
We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data.
arXiv Detail & Related papers (2024-05-23T09:43:19Z) - Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo [0.5110571587151475]
'RetChemQA' is a benchmark dataset designed to evaluate the capabilities of machine learning models in the domain of reticular chemistry.
This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type.
The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group.
arXiv Detail & Related papers (2024-05-03T14:29:54Z) - Language Models as Science Tutors [79.73256703631492]
We introduce TutorEval and TutorChat to measure real-life usability of LMs as scientific assistants.
We show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval.
We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH.
arXiv Detail & Related papers (2024-02-16T22:24:13Z) - MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning.
This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter.
We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z) - Pair Programming with Large Language Models for Sampling and Estimation
of Copulas [0.0]
An example Monte Carlo simulation based application for dependence modeling with copulas is developed using a state-of-the-art large language model (LLM)
This includes interaction with ChatGPT in natural language and using mathematical formalism, which led to producing a working code in Python and R.
Through careful prompt engineering, we separate successful solutions generated by ChatGPT from unsuccessful ones, resulting in a comprehensive list of related pros and cons.
arXiv Detail & Related papers (2023-03-31T15:02:48Z) - Logical Reasoning for Task Oriented Dialogue Systems [57.440956636333325]
We propose a novel method to fine-tune transformer models such as Roberta and T5 to reason over a set of facts in a given dialogue context.
Our method includes a synthetic data generation mechanism which helps the model learn logical relations.
We show that the transformer based model can perform logical reasoning to answer questions when the dialogue context contains all the required information.
arXiv Detail & Related papers (2022-02-08T21:46:27Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI.
The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL)
We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z) - Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG)
A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection.
We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.