Related papers: Automatic Microprocessor Performance Bug Detection

Automatic Microprocessor Performance Bug Detection

URL: http://arxiv.org/abs/2011.08781v2
Date: Thu, 19 Nov 2020 15:39:21 GMT
Title: Automatic Microprocessor Performance Bug Detection
Authors: Erick Carvajal Barboza and Sara Jacob and Mahesh Ketkar and Michael Kishinevsky and Paul Gratz and Jiang Hu
Abstract summary: We present a two-stage, machine learning-based methodology that is able to detect the existence of performance bugs in microprocessors. Our best technique detects 91.5% of microprocessor core performance bugs whose average IPC impact is greater than 1%. When evaluated on memory system bugs, our technique achieves 100% detection with zero false positives.
Score: 3.6462412165522466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Processor design validation and debug is a difficult and complex task, which consumes the lion's share of the design process. Design bugs that affect processor performance rather than its functionality are especially difficult to catch, particularly in new microarchitectures. This is because, unlike functional bugs, the correct processor performance of new microarchitectures on complex, long-running benchmarks is typically not deterministically known. Thus, when performance benchmarking new microarchitectures, performance teams may assume that the design is correct when the performance of the new microarchitecture exceeds that of the previous generation, despite significant performance regressions existing in the design. In this work, we present a two-stage, machine learning-based methodology that is able to detect the existence of performance bugs in microprocessors. Our results show that our best technique detects 91.5% of microprocessor core performance bugs whose average IPC impact across the studied applications is greater than 1% versus a bug-free design with zero false positives. When evaluated on memory system bugs, our technique achieves 100% detection with zero false positives. Moreover, the detection is automatic, requiring very little performance engineer time.

Related papers

Global Microprocessor Correctness in the Presence of Transient Execution [0.16385815610837165]
We advocate for the use of formal specifications, using the theory of refinement.<n>We introduce notions of correctness that can be used to deal with transient execution attacks, including Meltdown and Spectre.
arXiv Detail & Related papers (2025-06-20T16:56:14Z)
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion [15.06323814625609]
We present Concorde, a new methodology for learning fast and accurate performance models of microarchitectures. Concorde predicts the behavior of a program based on compact performance distributions that capture the impact of different microarchitectural components. Experiments show that Concorde is more than five orders of magnitude faster than a reference cycle-level simulator.
arXiv Detail & Related papers (2025-03-29T13:25:20Z)
LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones [10.435069781620957]
Research in efficient vision backbones is evolving into models that are a mixture of convolutions and transformer blocks. We analyze common modules and architectural design choices for backbones not in terms of MACs, but rather in actual throughput and latency. We combine both macro and micro design to create a new family of hardware-efficient backbone networks called LowFormer.
arXiv Detail & Related papers (2024-09-05T12:18:32Z)
Automatic Build Repair for Test Cases using Incompatible Java Versions [7.4881561767138365]
We introduce an approach to repair test cases of Java projects by performing dependency minimization. Unlike existing state-of-the-art techniques, our approach performs at source-level, which allows compile-time errors to be fixed.
arXiv Detail & Related papers (2024-04-27T07:55:52Z)
VeriBug: An Attention-based Framework for Bug-Localization in Hardware Designs [2.807347337531008]
In recent years, there has been an exponential growth in the size and complexity of System-on-Chip designs targeting different specialized applications. The cost of an undetected bug in these systems is much higher than in traditional processor systems as it may imply the loss of property or life. We propose VeriBug, which leverages recent advances in deep learning to accelerate debug at the Register-Transfer Level and generates explanations of likely root causes.
arXiv Detail & Related papers (2024-01-17T01:33:37Z)
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines. Academic research is often restrained to public datasets on the order of ten thousand samples. We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z)
PACE: A Program Analysis Framework for Continuous Performance Prediction [0.0]
PACE is a program analysis framework that provides continuous feedback on the performance impact of pending code updates. We design performance microbenchmarks by mapping the execution time of functional test cases given a code update. Our experiments achieved significant performance in predicting code performance, outperforming current state-of-the-art by 75% on neural-represented code stylometry features.
arXiv Detail & Related papers (2023-12-01T20:43:34Z)
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs [92.47146416628965]
FuzzyFlow is a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation.
arXiv Detail & Related papers (2023-06-28T13:00:17Z)
Task-Oriented Over-the-Air Computation for Multi-Device Edge AI [57.50247872182593]
6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task. Task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system.
arXiv Detail & Related papers (2022-11-02T16:35:14Z)
GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation [3.739243122393041]
We introduce a new machine learning model that estimates throughput of basic blocks across different microarchitectures. Results establish a new state-of-the-art for basic block performance estimation with an average test error of 6.9%. We propose the use of multi-task learning with independent multi-layer feed forward decoder networks.
arXiv Detail & Related papers (2022-10-08T03:03:49Z)
Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers [71.40595908386477]
We introduce a new faster attention condenser design called double-condensing attention condensers. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor. These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
arXiv Detail & Related papers (2022-08-15T02:47:33Z)
MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z)
TinyDefectNet: Highly Compact Deep Neural Network Architecture for High-Throughput Manufacturing Visual Quality Inspection [72.88856890443851]
TinyDefectNet is a highly compact deep convolutional network architecture tailored for high- throughput manufacturing visual quality inspection. TinyDefectNet was deployed on an AMD EPYC 7R32, and achieved 7.6x faster throughput using the nativeflow environment and 9x faster throughput using AMD ZenDNN accelerator library.
arXiv Detail & Related papers (2021-11-29T04:19:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.