Hyperparameter Analysis for Image Captioning
- URL: http://arxiv.org/abs/2006.10923v1
- Date: Fri, 19 Jun 2020 01:49:37 GMT
- Title: Hyperparameter Analysis for Image Captioning
- Authors: Amish Patel and Aravind Varier
- Abstract summary: We perform a thorough sensitivity analysis on state-of-the-art image captioning approaches using two different architectures: CNN+LSTM and CNN+Transformer.
The biggest takeaway from the experiments is that fine-tuning the CNN encoder outperforms the baseline.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we perform a thorough sensitivity analysis on state-of-the-art
image captioning approaches using two different architectures: CNN+LSTM and
CNN+Transformer. Experiments were carried out using the Flickr8k dataset. The
biggest takeaway from the experiments is that fine-tuning the CNN encoder
outperforms the baseline and all other experiments carried out for both
architectures.
Related papers
- An evaluation of CNN models and data augmentation techniques in hierarchical localization of mobile robots [0.0]
This work presents an evaluation of CNN models and data augmentation to carry out the hierarchical localization of a mobile robot.
In this sense, an ablation study of different state-of-the-art CNN models used as backbone is presented.
A variety of data augmentation visual effects are proposed for addressing the visual localization of the robot.
arXiv Detail & Related papers (2024-07-15T10:20:00Z) - Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study [47.03015281370405]
We show that the use of Complex Structure, which contains compact orientation features with certainties, improves identification accuracy compared to using grayscale inputs alone.
This suggests that the upfront use of orientation features in CNNs, a strategy seen in mammalian vision, not only mitigates their limitations but also enhances their explainability and relevance to thin-clients.
arXiv Detail & Related papers (2024-04-24T02:51:13Z) - Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid
Algorithm with Transformer and CNN Encoders [0.2353157426758003]
We compare the segmentation performance of Transformer and CNN models pre-trained on microscopy images with those pre-trained on natural images.
We also find that for image segmentation, the combination of pre-trained Transformers and CNN encoders are consistently better than pre-trained CNN encoders alone.
arXiv Detail & Related papers (2023-08-26T16:56:15Z) - Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval [68.61855682218298]
Cross-modal retrieval methods employ two-stream encoders with different architectures for images and texts.
Inspired by recent advances of Transformers in vision tasks, we propose to unify the encoder architectures with Transformers for both modalities.
We design a cross-modal retrieval framework purely based on two-stream Transformers, dubbed textbfHierarchical Alignment Transformers (HAT), which consists of an image Transformer, a text Transformer, and a hierarchical alignment module.
arXiv Detail & Related papers (2023-08-08T15:43:59Z) - Classification of diffraction patterns using a convolutional neural
network in single particle imaging experiments performed at X-ray
free-electron lasers [53.65540150901678]
Single particle imaging (SPI) at X-ray free electron lasers (XFELs) is particularly well suited to determine the 3D structure of particles in their native environment.
For a successful reconstruction, diffraction patterns originating from a single hit must be isolated from a large number of acquired patterns.
We propose to formulate this task as an image classification problem and solve it using convolutional neural network (CNN) architectures.
arXiv Detail & Related papers (2021-12-16T17:03:14Z) - Empirical Analysis of Image Caption Generation using Deep Learning [0.0]
We have implemented and experimented with various flavors of multi-modal image captioning networks.
The goal is to analyze the performance of each approach using various evaluation metrics.
arXiv Detail & Related papers (2021-05-14T05:38:13Z) - Deep ensembles based on Stochastic Activation Selection for Polyp
Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations.
Basic architecture in image segmentation consists of an encoder and a decoder.
We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Combining pretrained CNN feature extractors to enhance clustering of
complex natural images [27.784346095205358]
This paper aims at providing insight on the use of pretrained CNN features for image clustering (IC)
To solve this issue, we propose to rephrase the IC problem as a multi-view clustering (MVC) problem.
We then propose a multi-input neural network architecture that is trained end-to-end to solve the MVC problem effectively.
arXiv Detail & Related papers (2021-01-07T21:23:04Z) - Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.