Matmul or No Matmal in the Era of 1-bit LLMs
- URL: http://arxiv.org/abs/2408.11939v2
- Date: Wed, 28 Aug 2024 19:51:04 GMT
- Title: Matmul or No Matmal in the Era of 1-bit LLMs
- Authors: Jinendra Malekar, Mohammed E. Elbtity, Ramtin Zand,
- Abstract summary: 1-bit large language models (LLMs) have attracted considerable attention and opened up new research opportunities.
However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers.
In this work, we present an adaptation of Amdahl's Law tailored for the 1-bit LLM context.
- Score: 0.48212500317840945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of 1-bit large language models (LLMs) has attracted considerable attention and opened up new research opportunities. However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers while leaving attention heads unchanged. Therefore, to avoid fundamentally wrong choices of goals in future research, it is crucial to understand the actual improvements in computation and memory usage that 1-bit LLMs can deliver. In this work, we present an adaptation of Amdahl's Law tailored for the 1-bit LLM context, which illustrates how partial improvements in 1-bit LLMs impact overall model performance. Through extensive experiments, we uncover key nuances across different model architectures and hardware configurations, offering a roadmap for future research in the era of 1-bit LLMs.
Related papers
- LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.
LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.
Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - A Comprehensive Analysis on LLM-based Node Classification Algorithms [21.120619437937382]
We develop a comprehensive and testbed for node classification using Large Language Models (LLMs)
It includes ten datasets, eight LLM-based algorithms, and three learning paradigms, and is designed for easy extension with new methods and datasets.
We conduct extensive experiments, training and evaluating over 2,200 models, to determine the key settings that affect performance.
Our findings uncover eight insights, e.g., LLM-based methods can significantly outperform traditional methods in a semi-supervised setting, while the advantage is marginal in a supervised setting.
arXiv Detail & Related papers (2025-02-02T15:56:05Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - LLMs are Also Effective Embedding Models: An In-depth Overview [40.53941563464671]
Large language models (LLMs) have revolutionized natural language processing by achieving state-of-the-art performance across various tasks.
Recently, their effectiveness as embedding models has gained attention, marking a paradigm shift from traditional encoder-only models like ELMo and BERT to decoder-only, large-scale LLMs like GPT, LLaMA, and Mistral.
arXiv Detail & Related papers (2024-12-17T06:48:24Z) - Performance Law of Large Language Models [58.32539851241063]
Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources.
Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources without extensive experiments.
arXiv Detail & Related papers (2024-08-19T11:09:12Z) - An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs [54.91212829143966]
This study explores LLaMA3's capabilities when quantized to low bit-width.
We evaluate 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets.
Our experimental results indicate that LLaMA3 still suffers non-negligent degradation in linguistic and visual contexts.
arXiv Detail & Related papers (2024-04-22T10:03:03Z) - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits [129.6765656933016]
We introduce a 1-bit Large Language Models (LLMs) variant, namely BitNet b1.58.
The 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs.
It enables a new paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.
arXiv Detail & Related papers (2024-02-27T18:56:19Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward [29.81212051279456]
Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference.
This survey offers an overview of these methods, emphasizing recent developments.
arXiv Detail & Related papers (2024-02-02T06:29:34Z) - OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models [44.848642930797155]
We release OpenMoE, a series of fully open-sourced and reproducible decoder-only Mixture-of-Experts (MoE) based large language models (LLMs)
Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs.
We find that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance.
arXiv Detail & Related papers (2024-01-29T12:05:02Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.