ArthModel: Enhance Arithmetic Skills to Large Language Model
- URL: http://arxiv.org/abs/2311.18609v1
- Date: Thu, 30 Nov 2023 15:06:50 GMT
- Title: ArthModel: Enhance Arithmetic Skills to Large Language Model
- Authors: Yingdi Guo
- Abstract summary: This work provides different ways of thinking, training and using a language model.
The codes and models will be released at urlhttps://www.eteced.com/eteced/arithmetic_finetuning_v1.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: With the great success of ChatGPT, the research of large language models has
become increasingly popular. However, the models have several limitations, such
as toxicity and pool performance of arithmetic solving. Meanwhile, LLM may have
some potential abilities that have yet to be exploited. In this paper, we
choose a different way to enhance the arithmetic ability of LLM. We propose to
train LLM to generate a postfix expression related to the arithmetic problem
and incorporate it with small pretrained models. Moreover, this small model
transfers the token embeddings into real dense numbers and invokes native
functions of a deep learning platform to get the correct answer. To generate
the final result, we propose prompt injection for adding the result outputs by
the small model to LLM. This work provides different ways of thinking, training
and using a language model. The codes and models will be released at
\url{https://github.com/eteced/arithmetic_finetuning_v1}.
Related papers
- Cross-model Control: Improving Multiple Large Language Models in One-time Training [34.98931804630706]
Cross-model Control (CMC) is a method that improves multiple large language models in one-time training.
Based on this insight, we incorporate a tiny language model with a minimal number of parameters.
We propose a novel token mapping strategy named PM-MinED to make this tiny language model applicable to models with different vocabularies.
arXiv Detail & Related papers (2024-10-23T06:52:09Z) - Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R [1.9799527196428242]
Large Langauge Models (LLMs) have gained a lot of attention in the Software Engineering (SE) community.
In this work, we empirically study PEFT methods, LoRA and Compacter, on CodeT5 and CodeLlama.
We will assess their performance compared to fully fine-tuned models, whether they can be used for knowledge transfer from natural language models to code, and their ability to adapt the learned knowledge to an unseen language.
arXiv Detail & Related papers (2024-03-16T03:12:45Z) - Harnessing Large Language Models as Post-hoc Correctors [6.288056740658763]
We show that an LLM can work as a post-hoc corrector to propose corrections for the predictions of an arbitrary Machine Learning model.
We form a contextual knowledge database by incorporating the dataset's label information and the ML model's predictions on the validation dataset.
Our experimental results on text analysis and the challenging molecular predictions show that model improves the performance of a number of models by up to 39%.
arXiv Detail & Related papers (2024-02-20T22:50:41Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - Prompt2Model: Generating Deployable Models from Natural Language
Instructions [74.19816829003729]
Large language models (LLMs) enable system builders to create competent NLP systems through prompting.
In other ways, LLMs are a step backward from traditional special-purpose NLP models.
We propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs.
arXiv Detail & Related papers (2023-08-23T17:28:21Z) - Scaling Sentence Embeddings with Large Language Models [43.19994568210206]
In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance.
Our approach involves adapting the previous prompt-based representation method for autoregressive models.
By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity tasks.
arXiv Detail & Related papers (2023-07-31T13:26:03Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.