FLM-101B: An Open LLM and How to Train It with $100K Budget
- URL: http://arxiv.org/abs/2309.03852v3
- Date: Tue, 14 Jan 2025 06:40:36 GMT
- Title: FLM-101B: An Open LLM and How to Train It with $100K Budget
- Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang,
- Abstract summary: We show that FLM-101B, trained with our growth strategy under a budget of $100K, reaches 80% of the baselines' performances with only 10% of their floating-point operations.<n>We believe that further studies on progressive training will benefit the community by cutting down the costs and promoting green AI.
- Score: 63.244403881531035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are considered important approaches towards foundational machine intelligence, achieving remarkable success in Natural Language Processing and multimodal tasks, among others. However, the carbon footprints and financial costs originating from heavy pre-training computation is a non-negligible issue. Progressive training methods, inspired by the neurogenesis process that grows neural structures, have shown potential to accelerate LLM pre-training. However, the algorithms, implementation, and practices for progressively training LLMs beyond 100B parameters remain underexplored. In this paper, we show that our model, namely FLM-101B, trained with our growth strategy under a budget of \$100K, reaches 80\% of the baselines' performances with only 10\% of their floating-point operations. We believe that further studies on progressive training will benefit the community by cutting down the costs and promoting green AI. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.
Related papers
- Should We Still Pretrain Encoders with Masked Language Modeling? [27.19054714197245]
Recent evidence suggests that decoder models pretrained with Causal Language Modeling (CLM) can be effectively repurposed as encoders.<n>We train a total of 38 models ranging from 210 million to 1 billion parameters, and conduct over 15,000 fine-tuning and evaluation runs.<n>We find that while training with high-level CLM yields better performance across text representation tasks, CLM-trained models are more data-efficient and demonstrate improved fine-tuning stability.
arXiv Detail & Related papers (2025-07-01T17:45:48Z) - CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation [17.807249890437767]
We introduce CoLA and its memory-efficient implementation, CoLA-M.
We leverage the low-rank structure observed widely in model activations to reduce model size, boost model capacity and training efficiency.
Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by $bf 2pmbtimes$ and improves training throughput by $bf 1.86pmbtimes$ while maintaining full-rank level performance.
arXiv Detail & Related papers (2025-02-16T01:05:16Z) - Control LLM: Controlled Evolution for Intelligence Retention in LLM [4.67235851066221]
We propose textbfControl LLM, a novel approach that leverages parallel pre-trained and expanded transformer blocks.
Experiments demonstrate the effectiveness of Control LLM in both Continuous Pre-training (CPT) and Continuous Supervised Fine-Tuning (CSFT)
It surpasses existing methods and achieves SOTA among open-source models tuned from the same base model, using substantially less data and compute.
arXiv Detail & Related papers (2025-01-19T08:06:06Z) - TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation [24.954629877691623]
TICK (Targeted Instruct-evaluation with ChecKlists) is a fully automated, interpretable evaluation protocol.
We first show that, given an instruction, LLMs can reliably produce high-quality, tailored evaluation checklists.
We then show that STICK can be used to improve generation quality across multiple benchmarks via self-refinement and Best-of-N selection.
arXiv Detail & Related papers (2024-10-04T17:09:08Z) - Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL [14.091146805312636]
Credit assignment problem is a central challenge in Reinforcement Learning (RL)
Credit Assignment with Language Models (CALM) is a novel approach to automate credit assignment via reward shaping and options discovery.
Preliminary results indicate that the knowledge of Large Language Models is a promising prior for credit assignment in RL.
arXiv Detail & Related papers (2024-09-19T14:08:09Z) - Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance [0.0]
Large language models (LLMs) are known for their exceptional performance across a range of natural language processing tasks.
Smaller language models (SLMs), which can be deployed on lower-cost edge devices, struggle to match the performance of their larger counterparts.
This paper presents a novel hybrid inference approach that leverages the strengths of both model types.
arXiv Detail & Related papers (2024-09-15T15:12:45Z) - Beyond Next Token Prediction: Patch-Level Training for Large Language Models [69.67438563485887]
We introduce patch-level training for Large Language Models (LLMs)<n>During patch-level training, we feed the language model shorter sequences of patches and train it to predict the next patch.<n>We show that patch-level training can reduce the overall training costs to 0.5$times$, without compromising the model performance.
arXiv Detail & Related papers (2024-07-17T15:48:39Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Sparsity-Accelerated Training for Large Language Models [20.86225596276327]
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks.
LLMs often require additional training, such as continual pre-training and supervised fine-tuning.
This paper proposes leveraging emphsparsity in pre-trained LLMs to expedite this training process.
arXiv Detail & Related papers (2024-06-03T14:56:09Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - Can Large Language Models Play Games? A Case Study of A Self-Play
Approach [61.15761840203145]
Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge.
Monte-Carlo Tree Search (MCTS) is a search algorithm that provides reliable decision-making solutions.
This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve turn-based zero-sum games.
arXiv Detail & Related papers (2024-03-08T19:16:29Z) - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z) - Optimizing Distributed Training on Frontier for Large Language Models [7.251642875697334]
Training large language models (LLMs) with billions of parameters poses significant challenges and requires considerable computational resources.
This research explores efficient distributed training strategies to extract this computation from Frontier, the world's first exascale supercomputer.
arXiv Detail & Related papers (2023-12-20T02:03:15Z) - Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models [50.11814354654953]
Key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned.
This work introduces Skill-Mix, a new evaluation to measure ability to combine skills.
arXiv Detail & Related papers (2023-10-26T16:55:05Z) - Large Language Model Cascades with Mixture of Thoughts Representations
for Cost-efficient Reasoning [19.472937476936636]
Large language models (LLMs) have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services.
In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs.
Our proposed cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.
arXiv Detail & Related papers (2023-10-04T18:21:17Z) - Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models.
Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions.
Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z) - GrowLength: Accelerating LLMs Pretraining by Progressively Growing
Training Length [65.24730341801468]
This paper introduces a novel, simple, and effective method named growlength'' to accelerate the pretraining process of Large Language Models.
Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency.
arXiv Detail & Related papers (2023-10-01T05:25:24Z) - Knowledge Inheritance for Pre-trained Language Models [57.51305807391381]
We introduce a novel pre-training framework named "knowledge inheritance" (KI)
KI combines both self-learning and teacher-guided learning to efficiently train larger PLMs.
We show that KI can well support lifelong learning and knowledge transfer.
arXiv Detail & Related papers (2021-05-28T14:43:26Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.