Related papers: Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer

Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer

URL: http://arxiv.org/abs/2401.09823v1
Date: Thu, 18 Jan 2024 09:31:25 GMT
Title: Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer
Authors: Ali Tofik, Roy Partha Pratim
Abstract summary: We introduce Fast&Focused-Net, a novel deep neural network architecture tailored for encoding small objects into fixed-length feature vectors. Fast&Focused-Net employs a series of our newly proposed layer, the Volume-wise Dot Product (VDP) layer, designed to address several inherent limitations of CNNs. For small object classification tasks, our network outperformed state-of-the-art methods on datasets such as CIFAR-10, CIFAR-100, STL-10, SVHN-Cropped, and Fashion-MNIST. In the context of larger image classification, when combined with a transformer encoder (ViT
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce Fast&Focused-Net, a novel deep neural network architecture tailored for efficiently encoding small objects into fixed-length feature vectors. Contrary to conventional Convolutional Neural Networks (CNNs), Fast&Focused-Net employs a series of our newly proposed layer, the Volume-wise Dot Product (VDP) layer, designed to address several inherent limitations of CNNs. Specifically, CNNs often exhibit a smaller effective receptive field than their theoretical counterparts, limiting their vision span. Additionally, the initial layers in CNNs produce low-dimensional feature vectors, presenting a bottleneck for subsequent learning. Lastly, the computational overhead of CNNs, particularly in capturing diverse image regions by parameter sharing, is significantly high. The VDP layer, at the heart of Fast&Focused-Net, aims to remedy these issues by efficiently covering the entire image patch information with reduced computational demand. Experimental results demonstrate the prowess of Fast&Focused-Net in a variety of applications. For small object classification tasks, our network outperformed state-of-the-art methods on datasets such as CIFAR-10, CIFAR-100, STL-10, SVHN-Cropped, and Fashion-MNIST. In the context of larger image classification, when combined with a transformer encoder (ViT), Fast&Focused-Net produced competitive results for OpenImages V6, ImageNet-1K, and Places365 datasets. Moreover, the same combination showcased unparalleled performance in text recognition tasks across SVT, IC15, SVTP, and HOST datasets. This paper presents the architecture, the underlying motivation, and extensive empirical evidence suggesting that Fast&Focused-Net is a promising direction for efficient and focused deep learning.

Related papers

A Tree-guided CNN for image super-resolution [50.30242741813306]
We design a tree-guided CNN for image super-resolution (TSRNet)<n>It uses a tree architecture to guide a deep network to enhance effect of key nodes to amplify the relation of hierarchical information.<n>To prevent insufficiency of the obtained structural information, cosine transform techniques in the TSRNet are used to improve performance of image super-resolution.
arXiv Detail & Related papers (2025-06-03T08:05:11Z)
Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks [13.815116154370834]
We introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM) Network. The MLFM efficiently preserves low-frequency information, enhancing performance in targeted computer vision tasks. Our work builds upon the existing CNN foundations and paves the way for future advancements in computer vision.
arXiv Detail & Related papers (2024-03-13T00:48:41Z)
Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects [3.096615629099617]
Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. We propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer.
arXiv Detail & Related papers (2022-08-07T05:09:18Z)
Target Aware Network Architecture Search and Compression for Efficient Knowledge Transfer [9.434523476406424]
We propose a two-stage framework called TASCNet which enables efficient knowledge transfer. TASCNet reduces the computational complexity of pre-trained CNNs over the target task by reducing both trainable parameters and FLOPs. Similar to computer vision tasks, we have also conducted experiments on Movie Review Sentiment Analysis task.
arXiv Detail & Related papers (2022-05-12T09:11:00Z)
CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor. In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z)
Learning Versatile Neural Architectures by Propagating Network Codes [74.2450894473073]
We propose a novel "neural predictor", which is able to predict an architecture's performance in multiple datasets and tasks. NCP learns from network codes but not original data, enabling it to update the architecture efficiently across datasets.
arXiv Detail & Related papers (2021-03-24T15:20:38Z)
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer. Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z)
KiU-Net: Overcomplete Convolutional Architectures for Biomedical Image and Volumetric Segmentation [71.79090083883403]
"Traditional" encoder-decoder based approaches perform poorly in detecting smaller structures and are unable to segment boundary regions precisely. We propose KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features. The proposed method achieves a better performance as compared to all the recent methods with an additional benefit of fewer parameters and faster convergence.
arXiv Detail & Related papers (2020-10-04T19:23:33Z)
Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture. We show consistent improvements in accuracy and learning convergence over the baseline. Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.