Matmul or No Matmal in the Era of 1-bit LLMs
- URL: http://arxiv.org/abs/2408.11939v2
- Date: Wed, 28 Aug 2024 19:51:04 GMT
- Title: Matmul or No Matmal in the Era of 1-bit LLMs
- Authors: Jinendra Malekar, Mohammed E. Elbtity, Ramtin Zand,
- Abstract summary: 1-bit large language models (LLMs) have attracted considerable attention and opened up new research opportunities.
However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers.
In this work, we present an adaptation of Amdahl's Law tailored for the 1-bit LLM context.
- Score: 0.48212500317840945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of 1-bit large language models (LLMs) has attracted considerable attention and opened up new research opportunities. However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers while leaving attention heads unchanged. Therefore, to avoid fundamentally wrong choices of goals in future research, it is crucial to understand the actual improvements in computation and memory usage that 1-bit LLMs can deliver. In this work, we present an adaptation of Amdahl's Law tailored for the 1-bit LLM context, which illustrates how partial improvements in 1-bit LLMs impact overall model performance. Through extensive experiments, we uncover key nuances across different model architectures and hardware configurations, offering a roadmap for future research in the era of 1-bit LLMs.
Related papers
- LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.
LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.
Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - A Comprehensive Analysis on LLM-based Node Classification Algorithms [21.120619437937382]
We develop a comprehensive and testbed for node classification using Large Language Models (LLMs)
It includes ten datasets, eight LLM-based algorithms, and three learning paradigms, and is designed for easy extension with new methods and datasets.
We conduct extensive experiments, training and evaluating over 2,200 models, to determine the key settings that affect performance.
Our findings uncover eight insights, e.g., LLM-based methods can significantly outperform traditional methods in a semi-supervised setting, while the advantage is marginal in a supervised setting.
arXiv Detail & Related papers (2025-02-02T15:56:05Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - LLMs are Also Effective Embedding Models: An In-depth Overview [40.53941563464671]
Large language models (LLMs) have revolutionized natural language processing by achieving state-of-the-art performance across various tasks.
Recently, their effectiveness as embedding models has gained attention, marking a paradigm shift from traditional encoder-only models like ELMo and BERT to decoder-only, large-scale LLMs like GPT, LLaMA, and Mistral.
arXiv Detail & Related papers (2024-12-17T06:48:24Z) - Performance Law of Large Language Models [58.32539851241063]
Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources.
Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources without extensive experiments.
arXiv Detail & Related papers (2024-08-19T11:09:12Z) - An empirical study of LLaMA3 quantization: from LLMs to MLLMs [54.91212829143966]
The LLaMA family is one of the most powerful open-source large language models (LLMs)
LLaMA3 models have achieved impressive performance in various domains with super-large scale pre-training on over 15T tokens of data.
We evaluate the 10 existing post-training quantization and LoRA fine-tuning (LoRA-FT) methods of LLaMA3 on 1-8 bits and various datasets to reveal the low-bit quantization performance of LLaMA3.
arXiv Detail & Related papers (2024-04-22T10:03:03Z) - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits [129.6765656933016]
We introduce a 1-bit Large Language Models (LLMs) variant, namely BitNet b1.58.
The 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs.
It enables a new paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.
arXiv Detail & Related papers (2024-02-27T18:56:19Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models [44.848642930797155]
We release OpenMoE, a series of fully open-sourced and reproducible decoder-only Mixture-of-Experts (MoE) based large language models (LLMs)
Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs.
We find that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance.
arXiv Detail & Related papers (2024-01-29T12:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.