Climbing the WOL: Training for Cheaper Inference
- URL: http://arxiv.org/abs/2007.01230v2
- Date: Fri, 3 Jul 2020 03:11:54 GMT
- Title: Climbing the WOL: Training for Cheaper Inference
- Authors: Zichang Liu, Zhaozhuo Xu, Alan Ji, Jonathan Li, Beidi Chen, Anshumali
Shrivastava
- Abstract summary: We argue that approximate MIPS subroutines are sub-optimal because they are tailored for retrieving large inner products with high recall.
We propose a novel learned hash approach, which is significantly more efficient and sufficient for high inference accuracy.
- Score: 50.63998662655047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient inference for wide output layers (WOLs) is an essential yet
challenging task in large scale machine learning. Most approaches reduce this
problem to approximate maximum inner product search (MIPS), which relies
heavily on the observation that for a given model, ground truth labels
correspond to logits of highest value during full model inference. However,
such an assumption is restrictive in practice. In this paper, we argue that
approximate MIPS subroutines, despite having sub-linear computation time, are
sub-optimal because they are tailored for retrieving large inner products with
high recall instead of retrieving the correct labels. With WOL, the labels
often have moderate inner products, which makes approximate MIPS more
challenging. We propose an alternative problem formulation, called Label
Superior Sampling (LSS), where the objective is to tailor the system to ensure
retrieval of the correct label. Accordingly, we propose a novel learned hash
approach, which is significantly more efficient and sufficient for high
inference accuracy than MIPS baselines. Our extensive evaluation indicates that
LSS can match or even outperform full inference accuracy with around 5x speed
up and 87% energy reduction.
Related papers
- Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty [13.843606627539597]
This study seeks to enhance the efficiency of large language models (LLMs) by promoting conciseness for simpler problems.<n>We manage the model's reasoning efficiency by dividing the reward function and including a novel penalty for output length.<n>Our approach has yielded impressive outcomes in benchmark evaluations across three datasets: GSM8K, MATH500, and AIME2024.
arXiv Detail & Related papers (2025-06-12T07:49:24Z) - Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity [4.24164487223914]
We introduce Polar Sparsity, highlighting a key shift in sparsity importance from dense to Attention layers as we scale batch size and sequence length.<n>We develop hardware-efficient, sparsity-aware kernels for selective computation and Attention, delivering up to (2.2times) end-to-end speed for models like OPT, LLaMA-2 & 3, across various batch sizes and sequence lengths without compromising accuracy.
arXiv Detail & Related papers (2025-05-20T20:15:42Z) - Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints [14.341123057506827]
Large Language Models (LLMs) are indispensable in today's applications, but their inference procedure demands significant computational resources.
This paper formulates LLM inference optimization as a multi-stage online scheduling problem.
We develop a fluid dynamics approximation to provide a tractable benchmark that guides algorithm design.
arXiv Detail & Related papers (2025-04-15T16:00:21Z) - Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach.
Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z) - Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization [0.6445087473595953]
Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning.
deploying LLM inference poses challenges due to the high compute and memory requirements.
We present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision.
arXiv Detail & Related papers (2024-06-16T09:51:55Z) - Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models [73.48675708831328]
We propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs)
The Efficient Attention Skipping (EAS) method evaluates the attention redundancy and skips the less important MHAs to speed up inference.
The experiments show that EAS not only retains high performance and parameter efficiency, but also greatly speeds up inference speed.
arXiv Detail & Related papers (2024-03-22T14:20:34Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Learning with Noisy Labels: Interconnection of Two
Expectation-Maximizations [41.65589788264123]
Labor-intensive labeling becomes a bottleneck in developing computer vision algorithms based on deep learning.
We address learning with noisy labels (LNL) problem, which is formalized as a task of finding a structured manifold in the midst of noisy data.
Our algorithm achieves state-of-the-art performance in multiple standard benchmarks with substantial margins under various types of label noise.
arXiv Detail & Related papers (2024-01-09T07:22:30Z) - Optimization meets Machine Learning: An Exact Algorithm for Semi-Supervised Support Vector Machines [0.9831489366502302]
Support vector machines (SVMs) are well-studied supervised learning models for binary classification.
We present a new branch approach for S3VMs using semidefinite programming (SDP) relaxations.
SDP relaxation provides bounds significantly stronger than the ones available in the literature.
arXiv Detail & Related papers (2023-12-15T13:44:54Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z) - Online Metric Learning for Multi-Label Classification [22.484707213499714]
We propose a novel online metric learning paradigm for multi-label classification.
We first propose a new metric for multi-label classification based on $k$-Nearest Neighbour ($k$NN)
arXiv Detail & Related papers (2020-06-12T11:33:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.