Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2503.05546v1
- Date: Fri, 07 Mar 2025 16:19:19 GMT
- Title: Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning
- Authors: Raphael Trumpp, Ansgar Schäfftlein, Mirco Theile, Marco Caccamo,
- Abstract summary: We show that replacing the flattening of output feature maps in Impala-CNN with global average pooling leads to a notable performance improvement.<n>A decrease in the network's translation sensitivity may be central to this improvement.<n>Our results demonstrate that network scaling is not just about increasing model size - efficient network design is also an essential factor.
- Score: 1.2937020918620652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As image-based deep reinforcement learning tackles more challenging tasks, increasing model size has become an important factor in improving performance. Recent studies achieved this by focusing on the parameter efficiency of scaled networks, typically using Impala-CNN, a 15-layer ResNet-inspired network, as the image encoder. However, while Impala-CNN evidently outperforms older CNN architectures, potential advancements in network design for deep reinforcement learning-specific image encoders remain largely unexplored. We find that replacing the flattening of output feature maps in Impala-CNN with global average pooling leads to a notable performance improvement. This approach outperforms larger and more complex models in the Procgen Benchmark, particularly in terms of generalization. We call our proposed encoder model Impoola-CNN. A decrease in the network's translation sensitivity may be central to this improvement, as we observe the most significant gains in games without agent-centered observations. Our results demonstrate that network scaling is not just about increasing model size - efficient network design is also an essential factor.
Related papers
- VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis [8.10783983193165]
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are two dominant models for image analysis.
This paper introduces VisionGRU, a novel RNN-based architecture designed for efficient image classification.
arXiv Detail & Related papers (2024-12-24T05:27:11Z) - A Light-weight Deep Learning Model for Remote Sensing Image
Classification [70.66164876551674]
We present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC)
By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems.
arXiv Detail & Related papers (2023-02-25T09:02:01Z) - Revisiting Image Deblurring with an Efficient ConvNet [24.703240497171503]
We propose a lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers.
Our key design is an efficient CNN block dubbed LaKD, equipped with a large kernel depth-wise convolution and spatial-channel mixing structure.
We achieve +0.17dB / +0.43dB PSNR over the state-of-the-art Restormer on defocus / motion deblurring benchmark datasets with 32% fewer parameters and 39% fewer MACs.
arXiv Detail & Related papers (2023-02-04T20:42:46Z) - RDRN: Recursively Defined Residual Network for Image Super-Resolution [58.64907136562178]
Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution.
We propose a novel network architecture which utilizes attention blocks efficiently.
arXiv Detail & Related papers (2022-11-17T11:06:29Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Impact of Scaled Image on Robustness of Deep Neural Networks [0.0]
Scaling the raw images creates out-of-distribution data, which makes it a possible adversarial attack to fool the networks.
In this work, we propose a Scaling-distortion dataset ImageNet-CS by Scaling a subset of the ImageNet Challenge dataset by different multiples.
arXiv Detail & Related papers (2022-09-02T08:06:58Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Image Super-resolution with An Enhanced Group Convolutional Neural
Network [102.2483249598621]
CNNs with strong learning abilities are widely chosen to resolve super-resolution problem.
We present an enhanced super-resolution group CNN (ESRGCNN) with a shallow architecture.
Experiments report that our ESRGCNN surpasses the state-of-the-arts in terms of SISR performance, complexity, execution speed, image quality evaluation and visual effect in SISR.
arXiv Detail & Related papers (2022-05-29T00:34:25Z) - DDCNet: Deep Dilated Convolutional Neural Network for Dense Prediction [0.0]
A receptive field (ERF) and a higher resolution of spatial features within a network are essential for providing higher-resolution dense estimates.
We present a systemic approach to design network architectures that can provide a larger receptive field while maintaining a higher spatial feature resolution.
arXiv Detail & Related papers (2021-07-09T23:15:34Z) - AdderSR: Towards Energy Efficient Image Super-Resolution [127.61437479490047]
This paper studies the single image super-resolution problem using adder neural networks (AdderNet)
Compared with convolutional neural networks, AdderNet utilizing additions to calculate the output features thus avoid massive energy consumptions of conventional multiplications.
arXiv Detail & Related papers (2020-09-18T15:29:13Z) - Attentive Graph Neural Networks for Few-Shot Learning [74.01069516079379]
Graph Neural Networks (GNN) has demonstrated the superior performance in many challenging applications, including the few-shot learning tasks.
Despite its powerful capacity to learn and generalize the model from few samples, GNN usually suffers from severe over-fitting and over-smoothing as the model becomes deep.
We propose a novel Attentive GNN to tackle these challenges, by incorporating a triple-attention mechanism.
arXiv Detail & Related papers (2020-07-14T07:43:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.