Enhancing Small Object Encoding in Deep Neural Networks: Introducing
Fast&Focused-Net with Volume-wise Dot Product Layer
- URL: http://arxiv.org/abs/2401.09823v1
- Date: Thu, 18 Jan 2024 09:31:25 GMT
- Title: Enhancing Small Object Encoding in Deep Neural Networks: Introducing
Fast&Focused-Net with Volume-wise Dot Product Layer
- Authors: Ali Tofik, Roy Partha Pratim
- Abstract summary: We introduce Fast&Focused-Net, a novel deep neural network architecture tailored for encoding small objects into fixed-length feature vectors.
Fast&Focused-Net employs a series of our newly proposed layer, the Volume-wise Dot Product (VDP) layer, designed to address several inherent limitations of CNNs.
For small object classification tasks, our network outperformed state-of-the-art methods on datasets such as CIFAR-10, CIFAR-100, STL-10, SVHN-Cropped, and Fashion-MNIST.
In the context of larger image classification, when combined with a transformer encoder (ViT
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce Fast&Focused-Net, a novel deep neural network
architecture tailored for efficiently encoding small objects into fixed-length
feature vectors. Contrary to conventional Convolutional Neural Networks (CNNs),
Fast&Focused-Net employs a series of our newly proposed layer, the Volume-wise
Dot Product (VDP) layer, designed to address several inherent limitations of
CNNs. Specifically, CNNs often exhibit a smaller effective receptive field than
their theoretical counterparts, limiting their vision span. Additionally, the
initial layers in CNNs produce low-dimensional feature vectors, presenting a
bottleneck for subsequent learning. Lastly, the computational overhead of CNNs,
particularly in capturing diverse image regions by parameter sharing, is
significantly high. The VDP layer, at the heart of Fast&Focused-Net, aims to
remedy these issues by efficiently covering the entire image patch information
with reduced computational demand. Experimental results demonstrate the prowess
of Fast&Focused-Net in a variety of applications. For small object
classification tasks, our network outperformed state-of-the-art methods on
datasets such as CIFAR-10, CIFAR-100, STL-10, SVHN-Cropped, and Fashion-MNIST.
In the context of larger image classification, when combined with a transformer
encoder (ViT), Fast&Focused-Net produced competitive results for OpenImages V6,
ImageNet-1K, and Places365 datasets. Moreover, the same combination showcased
unparalleled performance in text recognition tasks across SVT, IC15, SVTP, and
HOST datasets. This paper presents the architecture, the underlying motivation,
and extensive empirical evidence suggesting that Fast&Focused-Net is a
promising direction for efficient and focused deep learning.
Related papers
- Multiscale Low-Frequency Memory Network for Improved Feature Extraction
in Convolutional Neural Networks [13.815116154370834]
We introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM) Network.
The MLFM efficiently preserves low-frequency information, enhancing performance in targeted computer vision tasks.
Our work builds upon the existing CNN foundations and paves the way for future advancements in computer vision.
arXiv Detail & Related papers (2024-03-13T00:48:41Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - No More Strided Convolutions or Pooling: A New CNN Building Block for
Low-Resolution Images and Small Objects [3.096615629099617]
Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks.
However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small.
We propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer.
arXiv Detail & Related papers (2022-08-07T05:09:18Z) - Target Aware Network Architecture Search and Compression for Efficient
Knowledge Transfer [9.434523476406424]
We propose a two-stage framework called TASCNet which enables efficient knowledge transfer.
TASCNet reduces the computational complexity of pre-trained CNNs over the target task by reducing both trainable parameters and FLOPs.
Similar to computer vision tasks, we have also conducted experiments on Movie Review Sentiment Analysis task.
arXiv Detail & Related papers (2022-05-12T09:11:00Z) - CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded
Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor.
In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z) - Learning Versatile Neural Architectures by Propagating Network Codes [74.2450894473073]
We propose a novel "neural predictor", which is able to predict an architecture's performance in multiple datasets and tasks.
NCP learns from network codes but not original data, enabling it to update the architecture efficiently across datasets.
arXiv Detail & Related papers (2021-03-24T15:20:38Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - KiU-Net: Overcomplete Convolutional Architectures for Biomedical Image
and Volumetric Segmentation [71.79090083883403]
"Traditional" encoder-decoder based approaches perform poorly in detecting smaller structures and are unable to segment boundary regions precisely.
We propose KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features.
The proposed method achieves a better performance as compared to all the recent methods with an additional benefit of fewer parameters and faster convergence.
arXiv Detail & Related papers (2020-10-04T19:23:33Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.