Related papers: Iterative Filter Pruning for Concatenation-based CNN Architectures

Iterative Filter Pruning for Concatenation-based CNN Architectures

URL: http://arxiv.org/abs/2405.03715v1
Date: Sat, 4 May 2024 19:40:42 GMT
Title: Iterative Filter Pruning for Concatenation-based CNN Architectures
Authors: Svetlana Pavlitska, Oliver Bagge, Federico Peccia, Toghrul Mammadov, J. Marius Zöllner,
Abstract summary: Modern object detectors have highly interconnected convolutional layers with concatenations. We propose a method to handle concatenation layers, based on the connectivity graph of convolutional layers. We deploy pruned models to FPGA and NVIDIA Jetson Xavier AGX.
Score: 9.651318927588934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model compression and hardware acceleration are essential for the resource-efficient deployment of deep neural networks. Modern object detectors have highly interconnected convolutional layers with concatenations. In this work, we study how pruning can be applied to such architectures, exemplary for YOLOv7. We propose a method to handle concatenation layers, based on the connectivity graph of convolutional layers. By automating iterative sensitivity analysis, pruning, and subsequent model fine-tuning, we can significantly reduce model size both in terms of the number of parameters and FLOPs, while keeping comparable model accuracy. Finally, we deploy pruned models to FPGA and NVIDIA Jetson Xavier AGX. Pruned models demonstrate a 2x speedup for the convolutional layers in comparison to the unpruned counterparts and reach real-time capability with 14 FPS on FPGA. Our code is available at https://github.com/fzi-forschungszentrum-informatik/iterative-yolo-pruning.

Related papers

RL-Pruner: Structured Pruning Using Reinforcement Learning for CNN Compression and Acceleration [0.0]
We propose RL-Pruner, which uses reinforcement learning to learn the optimal pruning distribution. RL-Pruner can automatically extract dependencies between filters in the input model and perform pruning, without requiring model-specific pruning implementations.
arXiv Detail & Related papers (2024-11-10T13:35:10Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Sequence Modeling with Multiresolution Convolutional Memory [27.218134279968062]
We build a new building block for sequence modeling called a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks.
arXiv Detail & Related papers (2023-05-02T17:50:54Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z)
Adaptive Filters and Aggregator Fusion for Efficient Graph Convolutions [11.769185588579488]
We present state-of-the-art performance with lower memory consumption and latency, along with characteristics suited to accelerator implementation. Our proposal uses memory proportional to the number of vertices in the graph, in contrast to competing methods which require memory proportional to the number of edges. We propose aggregator fusion, a technique to enable GNNs to significantly boost their representational power, with only a small increase in latency of 19% over standard sparse matrix multiplication.
arXiv Detail & Related papers (2021-04-03T20:54:36Z)
Fast convolutional neural networks on FPGAs with hls4ml [0.22756183402372013]
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks on FPGAs. We demonstrate how to achieve inference latency of $5,mu$s using convolutional architectures, while preserving state-of-the-art model performance.
arXiv Detail & Related papers (2021-01-13T14:47:11Z)
Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs. It is still challenging to perform training in an efficient and scalable way. We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z)
Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling. Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z)
DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning. Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers. Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z)
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach [55.44107800525776]
Graph Convolutional Networks (GCNs) are state-of-the-art graph based representation learning models. In this paper, we revisit GCN based Collaborative Filtering (CF) based Recommender Systems (RS) We show that removing non-linearities would enhance recommendation performance, consistent with the theories in simple graph convolutional networks. We propose a residual network structure that is specifically designed for CF with user-item interaction modeling.
arXiv Detail & Related papers (2020-01-28T04:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.