Related papers: Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching

Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching

URL: http://arxiv.org/abs/2205.02269v1
Date: Sun, 1 May 2022 05:30:37 GMT
Title: Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching
Authors: Pengmiao Zhang, Ajitesh Srivastava, Anant V. Nori, Rajgopal Kannan, Viktor K. Prasanna
Abstract summary: We propose TransFetch, a novel way to model prefetching. To reduce vocabulary size, we use fine-grained address segmentation as input. To predict unordered sets of future addresses, we use delta bitmaps for multiple outputs.
Score: 10.128730975303407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning algorithms have shown potential to improve prefetching performance by accurately predicting future memory accesses. Existing approaches are based on the modeling of text prediction, considering prefetching as a classification problem for sequence prediction. However, the vast and sparse memory address space leads to large vocabulary, which makes this modeling impractical. The number and order of outputs for multiple cache line prefetching are also fundamentally different from text prediction. We propose TransFetch, a novel way to model prefetching. To reduce vocabulary size, we use fine-grained address segmentation as input. To predict unordered sets of future addresses, we use delta bitmaps for multiple outputs. We apply an attention-based network to learn the mapping between input and output. Prediction experiments demonstrate that address segmentation achieves 26% - 36% higher F1-score than delta inputs and 15% - 24% higher F1-score than page & offset inputs for SPEC 2006, SPEC 2017, and GAP benchmarks. Simulation results show that TransFetch achieves 38.75% IPC improvement compared with no prefetching, outperforming the best-performing rule-based prefetcher BOP by 10.44%, and ML-based prefetcher Voyager by 6.64%.

Related papers

ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera [53.20087549782785]
We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions.
arXiv Detail & Related papers (2024-10-14T19:14:49Z)
missForestPredict -- Missing data imputation for prediction settings [2.8461446020965435]
missForestPredict is a fast and user-friendly adaptation of the missForest imputation algorithm. missForestPredict offers extended error monitoring and control over variables used in the imputation. missForestPredict provides competitive results in prediction settings within short computation times.
arXiv Detail & Related papers (2024-07-02T17:45:46Z)
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction [61.295716741720284]
TokenUnify is a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction. Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution. This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date.
arXiv Detail & Related papers (2024-05-27T05:45:51Z)
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution [36.30158138035512]
We present a benchmark consisting of 800 Python functions (3-13 lines) Each function comes with an input-output pair, leading to two natural tasks: input prediction and output prediction. We show that simple CoT and fine-tuning schemes can improve performance on our benchmark but remain far from solving it.
arXiv Detail & Related papers (2024-01-05T20:53:51Z)
AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information. Test-time adaptive (TTA) methods are proposed to address this issue. In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z)
Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics [7.52191887022819]
We propose MPGraph, an ML-based Prefetcher for Graph analytics using domain specific models. MPGraph three novel optimizations: soft detection for phase transitions, phase-specific multi-modality models for access andtemporal prefetching. Using CST, MPGraph achieves 12.52-21.23% IPC improvement, outperforming state-of-the-art non-ML prefetcher BO by 7.5-12.03% and ML-based prefetchers Voyager and TransFetch by 3.27-4.58%.
arXiv Detail & Related papers (2022-12-10T09:14:44Z)
TransforMAP: Transformer for Memory Access Prediction [10.128730975303407]
Data Prefetching is a technique that can hide memory latency by fetching data before it is needed by a program. We develop TransforMAP, based on the powerful Transformer model, that can learn from the whole address space. We show that our approach achieves 35.67% MPKI improvement, higher than state-of-the-art prefetcher and ISB prefetcher.
arXiv Detail & Related papers (2022-05-29T22:14:38Z)
Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging [54.557406779183495]
We introduce an inverse paradigm for prompting. Different from the classic prompts mapping tokens to labels, we reversely predict slot values given slot types. We find, somewhat surprisingly, the proposed method not only predicts faster but also significantly improves the effect (improve over 6.1 F1-scores on 10-shot setting)
arXiv Detail & Related papers (2022-04-02T15:41:19Z)
Efficient and Differentiable Conformal Prediction with General Function Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters. We show that it achieves approximate valid population coverage and near-optimal efficiency within class. Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z)
BB-ML: Basic Block Performance Prediction using Machine Learning Techniques [0.6020800302423842]
We propose to use Machine Learning (ML) techniques for performance prediction at a much finer granularity, namely at the Basic Block (BB) level. We extrapolate the basic block execution counts of GPU applications and use them for predicting the performance for large input sizes from the counts of smaller input sizes. We achieve an accuracy 93.5% in extrapolating the basic block counts for large input sets when trained on smaller input sets.
arXiv Detail & Related papers (2022-02-16T00:19:15Z)
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing [55.97957664897004]
An effective recipe for building seq2seq, non-autoregressive, task-orienteds to map utterances to semantic frames proceeds in three steps. These models are typically bottlenecked by length prediction. In our work, we propose non-autoregressives which shift the decoding task from text generation to span prediction.
arXiv Detail & Related papers (2021-04-15T07:02:35Z)
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training [85.35910219651572]
We present a new sequence-to-sequence pre-training model called ProphetNet. It introduces a novel self-supervised objective named future n-gram prediction. We conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks.
arXiv Detail & Related papers (2020-01-13T05:12:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.