Code Soliloquies for Accurate Calculations in Large Language Models
- URL: http://arxiv.org/abs/2309.12161v2
- Date: Tue, 31 Oct 2023 20:27:53 GMT
- Title: Code Soliloquies for Accurate Calculations in Large Language Models
- Authors: Shashank Sonkar, MyCo Le, Xinghe Chen, Naiming Liu, Debshila Basu
Mallick, Richard G. Baraniuk
- Abstract summary: High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems.
These datasets are generated using advanced GPT-4 models.
Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4.
Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive.
- Score: 22.1024285108075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-quality conversational datasets are crucial for the successful
development of Intelligent Tutoring Systems (ITS) that utilize a Large Language
Model (LLM) backend. Synthetic student-teacher dialogues, generated using
advanced GPT-4 models, are a common strategy for creating these datasets.
However, subjects like physics that entail complex calculations pose a
challenge. While GPT-4 presents impressive language processing capabilities,
its limitations in fundamental mathematical reasoning curtail its efficacy for
such subjects. To tackle this limitation, we introduce in this paper an
innovative stateful prompt design. Our design orchestrates a mock conversation
where both student and tutorbot roles are simulated by GPT-4. Each student
response triggers an internal monologue, or `code soliloquy' in the
GPT-tutorbot, which assesses whether its subsequent response would necessitate
calculations. If a calculation is deemed necessary, it scripts the relevant
Python code and uses the Python output to construct a response to the student.
Our approach notably enhances the quality of synthetic conversation datasets,
especially for subjects that are calculation-intensive. Our preliminary Subject
Matter Expert evaluations reveal that our Higgs model, a fine-tuned LLaMA
model, effectively uses Python for computations, which significantly enhances
the accuracy and computational reliability of Higgs' responses. Code, models,
and datasets is available at https://github.com/luffycodes/Tutorbot-Spock-Phys.
Related papers
- Text to Band Gap: Pre-trained Language Models as Encoders for Semiconductor Band Gap Prediction [6.349503549199403]
In this study, we explore the use of a transformer-based language model as an encoder to predict the band gaps of semiconductor materials.
We generate material descriptions in two formats: formatted strings combining features and natural language text generated using the ChatGPT API.
We demonstrate that the RoBERTa model, pre-trained on natural language processing tasks, performs effectively as an encoder for prediction tasks.
arXiv Detail & Related papers (2025-01-07T00:56:26Z) - Molly: Making Large Language Model Agents Solve Python Problem More Logically [11.317420065020173]
Molly agent parses the learners' questioning intent through a scenario-based interaction.
At generation stage, the agent reflect on the generated responses to ensure that they not only align with factual content but also effectively answer the user's queries.
arXiv Detail & Related papers (2024-12-24T02:08:38Z) - MIND: Math Informed syNthetic Dialogues for Pretraining LLMs [34.498175178707065]
We propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method.
MIND generates synthetic conversations based on OpenWebMath (OWM), resulting in a new math corpus, MIND-OWM.
Our experiments with different conversational settings reveal that incorporating knowledge gaps between dialog participants is essential for generating high-quality math data.
arXiv Detail & Related papers (2024-10-15T18:25:53Z) - JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models [110.45794710162241]
Existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs to synthesize massive math problems.
We propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data.
We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data.
arXiv Detail & Related papers (2024-05-23T09:43:19Z) - Language Models as Science Tutors [79.73256703631492]
We introduce TutorEval and TutorChat to measure real-life usability of LMs as scientific assistants.
We show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval.
We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH.
arXiv Detail & Related papers (2024-02-16T22:24:13Z) - Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code [7.760653867600283]
We evaluate GPT-4 using three prompt engineering strategies -- basic prompting, in-context learning, and task-specific prompting.
We compare it against 17 fine-tuned models across three code-related tasks: code summarization, generation, and translation.
arXiv Detail & Related papers (2023-10-11T00:21:00Z) - Logical Reasoning for Task Oriented Dialogue Systems [57.440956636333325]
We propose a novel method to fine-tune transformer models such as Roberta and T5 to reason over a set of facts in a given dialogue context.
Our method includes a synthetic data generation mechanism which helps the model learn logical relations.
We show that the transformer based model can perform logical reasoning to answer questions when the dialogue context contains all the required information.
arXiv Detail & Related papers (2022-02-08T21:46:27Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI.
The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL)
We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z) - Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG)
A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection.
We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.