Related papers: Code Soliloquies for Accurate Calculations in Large Language Models

Code Soliloquies for Accurate Calculations in Large Language Models

URL: http://arxiv.org/abs/2309.12161v2
Date: Tue, 31 Oct 2023 20:27:53 GMT
Title: Code Soliloquies for Accurate Calculations in Large Language Models
Authors: Shashank Sonkar, MyCo Le, Xinghe Chen, Naiming Liu, Debshila Basu Mallick, Richard G. Baraniuk
Abstract summary: High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems. These datasets are generated using advanced GPT-4 models. Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4. Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive.
Score: 22.1024285108075
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems (ITS) that utilize a Large Language Model (LLM) backend. Synthetic student-teacher dialogues, generated using advanced GPT-4 models, are a common strategy for creating these datasets. However, subjects like physics that entail complex calculations pose a challenge. While GPT-4 presents impressive language processing capabilities, its limitations in fundamental mathematical reasoning curtail its efficacy for such subjects. To tackle this limitation, we introduce in this paper an innovative stateful prompt design. Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4. Each student response triggers an internal monologue, or `code soliloquy' in the GPT-tutorbot, which assesses whether its subsequent response would necessitate calculations. If a calculation is deemed necessary, it scripts the relevant Python code and uses the Python output to construct a response to the student. Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive. Our preliminary Subject Matter Expert evaluations reveal that our Higgs model, a fine-tuned LLaMA model, effectively uses Python for computations, which significantly enhances the accuracy and computational reliability of Higgs' responses. Code, models, and datasets is available at https://github.com/luffycodes/Tutorbot-Spock-Phys.

Related papers

Text to Band Gap: Pre-trained Language Models as Encoders for Semiconductor Band Gap Prediction [6.349503549199403]
In this study, we explore the use of a transformer-based language model as an encoder to predict the band gaps of semiconductor materials. We generate material descriptions in two formats: formatted strings combining features and natural language text generated using the ChatGPT API. We demonstrate that the RoBERTa model, pre-trained on natural language processing tasks, performs effectively as an encoder for prediction tasks.
arXiv Detail & Related papers (2025-01-07T00:56:26Z)
Molly: Making Large Language Model Agents Solve Python Problem More Logically [11.317420065020173]
Molly agent parses the learners' questioning intent through a scenario-based interaction. At generation stage, the agent reflect on the generated responses to ensure that they not only align with factual content but also effectively answer the user's queries.
arXiv Detail & Related papers (2024-12-24T02:08:38Z)
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs [34.498175178707065]
We propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method. MIND generates synthetic conversations based on OpenWebMath (OWM), resulting in a new math corpus, MIND-OWM. Our experiments with different conversational settings reveal that incorporating knowledge gaps between dialog participants is essential for generating high-quality math data.
arXiv Detail & Related papers (2024-10-15T18:25:53Z)
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models [110.45794710162241]
Existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs to synthesize massive math problems. We propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data.
arXiv Detail & Related papers (2024-05-23T09:43:19Z)
Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo [0.5110571587151475]
'RetChemQA' is a benchmark dataset designed to evaluate the capabilities of machine learning models in the domain of reticular chemistry. This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type. The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group.
arXiv Detail & Related papers (2024-05-03T14:29:54Z)
Language Models as Science Tutors [79.73256703631492]
We introduce TutorEval and TutorChat to measure real-life usability of LMs as scientific assistants. We show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval. We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH.
arXiv Detail & Related papers (2024-02-16T22:24:13Z)
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning. This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter. We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z)
Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code [7.760653867600283]
We evaluate GPT-4 using three prompt engineering strategies -- basic prompting, in-context learning, and task-specific prompting. We compare it against 17 fine-tuned models across three code-related tasks: code summarization, generation, and translation.
arXiv Detail & Related papers (2023-10-11T00:21:00Z)
Pair Programming with Large Language Models for Sampling and Estimation of Copulas [0.0]
An example Monte Carlo simulation based application for dependence modeling with copulas is developed using a state-of-the-art large language model (LLM) This includes interaction with ChatGPT in natural language and using mathematical formalism, which led to producing a working code in Python and R. Through careful prompt engineering, we separate successful solutions generated by ChatGPT from unsuccessful ones, resulting in a comprehensive list of related pros and cons.
arXiv Detail & Related papers (2023-03-31T15:02:48Z)
Logical Reasoning for Task Oriented Dialogue Systems [57.440956636333325]
We propose a novel method to fine-tune transformer models such as Roberta and T5 to reason over a set of facts in a given dialogue context. Our method includes a synthetic data generation mechanism which helps the model learn logical relations. We show that the transformer based model can perform logical reasoning to answer questions when the dialogue context contains all the required information.
arXiv Detail & Related papers (2022-02-08T21:46:27Z)
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI. The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL) We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z)
Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG) A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection. We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.