Related papers: A Simple Fix for Convolutional Neural Network via Coordinate Embedding

A Simple Fix for Convolutional Neural Network via Coordinate Embedding

URL: http://arxiv.org/abs/2003.10589v1
Date: Tue, 24 Mar 2020 00:31:27 GMT
Title: A Simple Fix for Convolutional Neural Network via Coordinate Embedding
Authors: Liliang Ren, Zhuonan Hao
Abstract summary: We propose a simple approach to incorporate the coordinate information to the CNN model through coordinate embedding. Our approach does not change the downstream model architecture and can be easily applied to the pre-trained models for the task like object detection.
Score: 2.1320960069210484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional Neural Networks (CNN) has been widely applied in the realm of computer vision. However, given the fact that CNN models are translation invariant, they are not aware of the coordinate information of each pixel. Thus the generalization ability of CNN will be limited since the coordinate information is crucial for a model to learn affine transformations which directly operate on the coordinate of each pixel. In this project, we proposed a simple approach to incorporate the coordinate information to the CNN model through coordinate embedding. Our approach does not change the downstream model architecture and can be easily applied to the pre-trained models for the task like object detection. Our experiments on the German Traffic Sign Detection Benchmark show that our approach not only significantly improve the model performance but also have better robustness with respect to the affine transformation.

Related papers

Cross-Task Benchmarking of CNN Architectures [0.0]
This project provides a comparative study of dynamic convolutional neural networks (CNNs) for various tasks.<n>We compare five variants of CNNs: the vanilla CNN, the hard attention-based CNN, the soft attention-based CNN with local (pixel-wise) and global (image-wise) feature attention, and the omni-directional CNN (ODConv)<n>Experiments on Tiny ImageNet, Pascal VOC, and the UCR Time Series Classification Archive illustrate that attention mechanisms and dynamic convolution methods consistently exceed conventional CNNs in accuracy, efficiency, and computational performance.
arXiv Detail & Related papers (2026-02-25T15:20:21Z)
Learning from Frustration: Torsor CNNs on Graphs [0.4369550829556577]
We introduce Torsor CNNs, a framework for learning on graphs with local symmetries encoded as edge potentials.<n>We demonstrate its applicability to multi-view 3D recognition, where relative camera poses naturally define the required edge potentials.
arXiv Detail & Related papers (2025-10-27T12:59:45Z)
Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
Implicit neural representations (INRs) leverage neural networks to represent signals by mapping coordinates to their corresponding attributes. This work pioneers the exploration of the effect of kernel transformation of input/output while keeping the model itself unchanged. A byproduct of our findings is a simple yet effective method that combines scale and shift to significantly boost INR with negligible overhead.
arXiv Detail & Related papers (2025-04-07T04:43:50Z)
Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition [0.0]
Deep convolutional neural networks (CNNs) have been shown to be very successful in a wide range of image processing applications. Due to their increasing number of model parameters and an increasing availability of large amounts of training data, parallelization strategies to efficiently train complex CNNs are necessary.
arXiv Detail & Related papers (2024-08-26T17:35:01Z)
CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation [60.08541107831459]
This paper proposes a CNN-Transformer rectified collaborative learning framework to learn stronger CNN-based and Transformer-based models for medical image segmentation. Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels. We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space.
arXiv Detail & Related papers (2024-08-25T01:27:35Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers [97.75725574963197]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. We show that a sequence of such transformations induces a single linear transformation that faithfully summarises the full model computations. We show that the resulting explanations are of high visual quality and perform well under quantitative interpretability metrics.
arXiv Detail & Related papers (2023-06-19T12:54:28Z)
A Novel Hand Gesture Detection and Recognition system based on ensemble-based Convolutional Neural Network [3.5665681694253903]
Detection of hand portion has become a challenging task in computer vision and pattern recognition communities. Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks. In this paper, an ensemble of CNN-based approaches is presented to overcome some problems like high variance during prediction, overfitting problem and also prediction errors.
arXiv Detail & Related papers (2022-02-25T06:46:58Z)
Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames. Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks. We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z)
Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations. We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z)
Lookup subnet based Spatial Graph Convolutional neural Network [3.119764474774276]
We propose a cross-correlation based graph convolution method allowing to naturally generalize CNNs to non-Euclidean domains. Our method has achieved or matched popular state-of-the-art results across three established graph benchmarks.
arXiv Detail & Related papers (2021-02-04T13:05:30Z)
Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs. We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z)
Eigen-CAM: Class Activation Map using Principal Components [1.2691047660244335]
This paper builds on previous ideas to cope with the increasing demand for interpretable, robust, and transparent models. The proposed Eigen-CAM computes and visualizes the principle components of the learned features/representations from the convolutional layers.
arXiv Detail & Related papers (2020-08-01T17:14:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.