Benchmarking Performance of Deep Learning Model for Material
Segmentation on Two HPC Systems
- URL: http://arxiv.org/abs/2307.14921v1
- Date: Thu, 27 Jul 2023 15:03:13 GMT
- Title: Benchmarking Performance of Deep Learning Model for Material
Segmentation on Two HPC Systems
- Authors: Warren R. Williams, S. Ross Glandon, Luke L. Morris, Jing-Ru C. Cheng
- Abstract summary: Performance data is gathered on two ERDC DSRC systems, Onyx and Vulcanite.
Vulcanite has faster model times in a large number of benchmarks, and it is also more subject to some environmental factors that can cause performances slower than Onyx.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance Benchmarking of HPC systems is an ongoing effort that seeks to
provide information that will allow for increased performance and improve the
job schedulers that manage these systems. We develop a benchmarking tool that
utilizes machine learning models and gathers performance data on
GPU-accelerated nodes while they perform material segmentation analysis. The
benchmark uses a ML model that has been converted from Caffe to PyTorch using
the MMdnn toolkit and the MINC-2500 dataset. Performance data is gathered on
two ERDC DSRC systems, Onyx and Vulcanite. The data reveals that while
Vulcanite has faster model times in a large number of benchmarks, and it is
also more subject to some environmental factors that can cause performances
slower than Onyx. In contrast the model times from Onyx are consistent across
benchmarks.
Related papers
- ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design [15.71144418188142]
Large Language Models (LLMs) show significant potential in hardware engineering.<n>Current benchmarks suffer from saturation and limited task diversity.<n>We propose a comprehensive benchmark for AI-aided chip design.
arXiv Detail & Related papers (2026-01-29T09:26:55Z) - Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization [0.0]
Modern machine learning training is increasingly bottlenecked by data I/O rather than compute.<n>This paper presents a machine learning approach to predict I/O performance and recommend optimal storage configurations for ML training pipelines.
arXiv Detail & Related papers (2025-12-07T07:25:08Z) - Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems [1.2289544895833646]
We present a framework for comparing multi-agent PyTorch optimization systems.<n>We show that exploit-heavy strategies perform best when paired with error-fixing agents.<n>The best implementation achieves an average 2.88x speedup on an H100 GPU.
arXiv Detail & Related papers (2025-11-21T05:37:38Z) - DS@GT at LongEval: Evaluating Temporal Performance in Web Search Systems and Topics with Two-Stage Retrieval [44.99833362998488]
The DS@GT competition team participated in the Longitudinal Evaluation of Model Performance (LongEval) lab at CLEF 2025.<n>Our analysis of the Qwant web dataset includes exploratory data analysis with topic modeling over time.<n>Our best system achieves an average NDCG@10 of 0.296 across the entire training and test dataset, with an overall best score of 0.395 on 2023-05.
arXiv Detail & Related papers (2025-07-11T07:23:08Z) - AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results [55.33807002543901]
We present AIvaluateXR, a comprehensive evaluation framework for benchmarking large language models (LLMs) running on XR devices.<n>We deploy 17 selected LLMs across four XR platforms: Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro, and conduct an extensive evaluation.<n>We propose a unified evaluation method based on the 3D Optimality theory to select the optimal device-model pairs from quality and speed objectives.
arXiv Detail & Related papers (2025-02-13T20:55:48Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.
We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation [83.18930314027254]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X) with up to ViT-Huge as the backbone.
With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments.
arXiv Detail & Related papers (2023-09-29T17:58:06Z) - Temporal Graph Benchmark for Machine Learning on Temporal Graphs [54.52243310226456]
Temporal Graph Benchmark (TGB) is a collection of challenging and diverse benchmark datasets.
We benchmark each dataset and find that the performance of common models can vary drastically across datasets.
TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research.
arXiv Detail & Related papers (2023-07-03T13:58:20Z) - Benchmarking Edge Computing Devices for Grape Bunches and Trunks
Detection using Accelerated Object Detection Single Shot MultiBox Deep
Learning Models [2.1922186455344796]
This work benchmarks the performance of different platforms for object detection in real-time.
Authors used the RetinaNet ResNet-50 fine-tuned using the natural Vine dataset.
arXiv Detail & Related papers (2022-11-21T17:02:33Z) - Tech Report: One-stage Lightweight Object Detectors [0.38073142980733]
This work is for designing one-stage lightweight detectors which perform well in terms of mAP and latency.
With baseline models each of which targets on GPU and CPU respectively, various operations are applied instead of the main operations in backbone networks of baseline models.
arXiv Detail & Related papers (2022-10-31T09:02:37Z) - PDEBENCH: An Extensive Benchmark for Scientific Machine Learning [20.036987098901644]
We introduce PDEBench, a benchmark suite of time-dependent simulation tasks based on Partial Differential Equations (PDEs)
PDEBench comprises both code and data to benchmark the performance of novel machine learning models against both classical numerical simulations and machine learning baselines.
arXiv Detail & Related papers (2022-10-13T17:03:36Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - Building a Performance Model for Deep Learning Recommendation Model
Training on GPUs [6.05245376098191]
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM)
We show that both the device active time (the sum of kernel runtimes) and the device idle time are important components of the overall device time.
We propose a critical-path-based algorithm to predict the per-batch training time of DLRM by traversing its execution graph.
arXiv Detail & Related papers (2022-01-19T19:05:42Z) - MLPerfTM HPC: A Holistic Benchmark Suite for Scientific Machine Learning
on HPC Systems [32.621917787044396]
We introduceerf HPC, a benchmark suite of scientific machine learning training applications driven by the MLCommonsTM Association.
We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance.
We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior.
arXiv Detail & Related papers (2021-10-21T20:30:12Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - RadixSpline: A Single-Pass Learned Index [84.84747738666263]
We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data.
RS achieves competitive results on all datasets, despite the fact that it only has two parameters.
arXiv Detail & Related papers (2020-04-30T01:56:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.