Related papers: Scalable Deep-Learning-Accelerated Topology Optimization for Additively Manufactured Materials

Scalable Deep-Learning-Accelerated Topology Optimization for Additively Manufactured Materials

URL: http://arxiv.org/abs/2011.14177v1
Date: Sat, 28 Nov 2020 17:38:31 GMT
Title: Scalable Deep-Learning-Accelerated Topology Optimization for Additively Manufactured Materials
Authors: Sirui Bi, Jiaxin Zhang, Guannan Zhang
Abstract summary: Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices. To address these issues, we propose a general scalable deep-learning (DL) based TO framework, referred to as SDL-TO. Our framework accelerates TO by learning the iterative history data and simultaneously training on the mapping between the given design and its gradient.
Score: 4.221095652322005
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices. Two computational challenges have limited the applicability of TO to a variety of industrial applications. First, a TO problem often involves a large number of design variables to guarantee sufficient expressive power. Second, many TO problems require a large number of expensive physical model simulations, and those simulations cannot be parallelized. To address these issues, we propose a general scalable deep-learning (DL) based TO framework, referred to as SDL-TO, which utilizes parallel schemes in high performance computing (HPC) to accelerate the TO process for designing additively manufactured (AM) materials. Unlike the existing studies of DL for TO, our framework accelerates TO by learning the iterative history data and simultaneously training on the mapping between the given design and its gradient. The surrogate gradient is learned by utilizing parallel computing on multiple CPUs incorporated with a distributed DL training on multiple GPUs. The learned TO gradient enables a fast online update scheme instead of an expensive update based on the physical simulator or solver. Using a local sampling strategy, we achieve to reduce the intrinsic high dimensionality of the design space and improve the training accuracy and the scalability of the SDL-TO framework. The method is demonstrated by benchmark examples and AM materials design for heat conduction. The proposed SDL-TO framework shows competitive performance compared to the baseline methods but significantly reduces the computational cost by a speed up of around 8.6x over the standard TO implementation.

Related papers

Understanding and Optimizing Multi-Stage AI Inference Pipelines [11.254219071373319]
HERMES is a Heterogeneous Multi-stage LLM inference Execution Simulator. HERMES supports heterogeneous clients executing multiple models concurrently unlike prior frameworks. We explore the impact of reasoning stages on end-to-end latency, optimal strategies for hybrid pipelines, and the architectural implications of remote KV cache retrieval.
arXiv Detail & Related papers (2025-04-14T00:29:49Z)
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
SLaNC: Static LayerNorm Calibration [1.2016264781280588]
Quantization to lower precision formats naturally poses a number of challenges caused by the limited range of the available value representations. In this article, we propose a computationally-efficient scaling technique that can be easily applied to Transformer models during inference. Our method suggests a straightforward way of scaling the LayerNorm inputs based on the static weights of the immediately preceding linear layers.
arXiv Detail & Related papers (2024-10-14T14:32:55Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Multiplicative update rules for accelerating deep learning training and increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Using Gradient to Boost the Generalization Performance of Deep Learning Models for Fluid Dynamics [0.0]
We present a novel work to increase the generalization capabilities of Deep Learning. Our strategy has shown good results towards a better generalization of DL networks.
arXiv Detail & Related papers (2022-10-09T10:20:09Z)
An Adaptive and Scalable ANN-based Model-Order-Reduction Method for Large-Scale TO Designs [22.35243726859667]
Topology Optimization (TO) provides a systematic approach for obtaining structure design with optimum performance of interest. Deep learning-based models have been developed to accelerate the process. MapNet is a neural network which maps the field of interest from coarse-scale to fine-scale.
arXiv Detail & Related papers (2022-03-20T10:12:24Z)
A Graph Deep Learning Framework for High-Level Synthesis Design Space Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware. We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs. We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z)
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [18.04657939198617]
This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of machine learning models on hardware accelerators. It analyzes different hardware designs and acceleration techniques and analyzes them in terms of hardware and execution costs. The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors.
arXiv Detail & Related papers (2020-07-02T04:08:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.