Orca 2: Teaching Small Language Models How to Reason
- URL: http://arxiv.org/abs/2311.11045v2
- Date: Tue, 21 Nov 2023 19:43:31 GMT
- Title: Orca 2: Teaching Small Language Models How to Reason
- Authors: Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas,
Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik
Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed
Khanpour, Ahmed Awadallah
- Abstract summary: Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models.
Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger.
- Score: 35.0285407867139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Orca 1 learns from rich signals, such as explanation traces, allowing it to
outperform conventional instruction-tuned models on benchmarks like BigBench
Hard and AGIEval. In Orca 2, we continue exploring how improved training
signals can enhance smaller LMs' reasoning abilities. Research on training
small LMs has often relied on imitation learning to replicate the output of
more capable models. We contend that excessive emphasis on imitation may
restrict the potential of smaller models. We seek to teach small LMs to employ
different solution strategies for different tasks, potentially different from
the one used by the larger model. For example, while larger models might
provide a direct answer to a complex task, smaller models may not have the same
capacity. In Orca 2, we teach the model various reasoning techniques
(step-by-step, recall then generate, recall-reason-generate, direct answer,
etc.). More crucially, we aim to help the model learn to determine the most
effective solution strategy for each task. We evaluate Orca 2 using a
comprehensive set of 15 diverse benchmarks (corresponding to approximately 100
tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of
similar size and attains performance levels similar or better to those of
models 5-10x larger, as assessed on complex tasks that test advanced reasoning
abilities in zero-shot settings. make Orca 2 weights publicly available at
aka.ms/orca-lm to support research on the development, evaluation, and
alignment of smaller LMs
Related papers
- LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - Depth Anything V2 [84.88796880335283]
V2 produces much finer and more robust depth predictions through three key practices.
We replace all labeled real images with synthetic images, scale up the capacity of our teacher model, and teach student models via the bridge of large-scale pseudo-labeled real images.
Benefiting from their strong generalization capability, we fine-tune them with metric depth labels to obtain our metric depth models.
arXiv Detail & Related papers (2024-06-13T17:59:56Z) - Large Language Model Pruning [0.0]
We suggest a model pruning technique specifically focused on LLMs.
The proposed methodology emphasizes the explainability of deep learning models.
We also explore the difference between pruning on large-scale models vs. pruning on small-scale models.
arXiv Detail & Related papers (2024-05-24T18:22:15Z) - Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought [51.240387516059535]
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., 1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks.
We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals.
arXiv Detail & Related papers (2024-04-04T12:46:37Z) - Orca-Math: Unlocking the potential of SLMs in Grade School Math [10.206509967833664]
A recent study hypothesized that the smallest model size, needed to achieve over 80% accuracy on the GSM8K benchmark, is 34 billion parameters.
To reach this level of performance with smaller models, researcher often train SLMs to generate Python code or use tools to help avoid calculation errors.
Our approach has the following key elements: A high quality synthetic dataset of 200K math problems created using a multi-agent setup where agents collaborate to create the data.
arXiv Detail & Related papers (2024-02-16T23:44:38Z) - Orca: Progressive Learning from Complex Explanation Traces of GPT-4 [22.526048553548726]
We develop Orca, a 13-billion parameter model that learns to imitate the reasoning process of LFMs.
Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions.
Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks.
arXiv Detail & Related papers (2023-06-05T08:58:39Z) - Specializing Smaller Language Models towards Multi-Step Reasoning [56.78474185485288]
We show that abilities can be distilled down from GPT-3.5 ($ge$ 175B) to T5 variants ($le$ 11B)
We propose model specialization, to specialize the model's ability towards a target task.
arXiv Detail & Related papers (2023-01-30T08:51:19Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.