Towards Learning a Vocabulary of Visual Concepts and Operators using
Deep Neural Networks
- URL: http://arxiv.org/abs/2109.00479v1
- Date: Wed, 1 Sep 2021 16:34:57 GMT
- Title: Towards Learning a Vocabulary of Visual Concepts and Operators using
Deep Neural Networks
- Authors: Sunil Kumar Vengalil and Neelam Sinha
- Abstract summary: We analyze the learned feature maps of trained models using MNIST images for achieving more explainable predictions.
We illustrate the idea by generating visual concepts from a Variational Autoencoder trained using MNIST images.
We were able to reduce the reconstruction loss (mean square error) from an initial value of 120 without augmentation to 60 with augmentation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural networks have become the default choice for many applications
like image and video recognition, segmentation and other image and video
related tasks.However, a critical challenge with these models is the lack of
explainability.This requirement of generating explainable predictions has
motivated the research community to perform various analysis on trained
models.In this study, we analyze the learned feature maps of trained models
using MNIST images for achieving more explainable predictions.Our study is
focused on deriving a set of primitive elements, here called visual concepts,
that can be used to generate any arbitrary sample from the data generating
distribution.We derive the primitive elements from the feature maps learned by
the model.We illustrate the idea by generating visual concepts from a
Variational Autoencoder trained using MNIST images.We augment the training data
of MNIST dataset by adding about 60,000 new images generated with visual
concepts chosen at random.With this we were able to reduce the reconstruction
loss (mean square error) from an initial value of 120 without augmentation to
60 with augmentation.Our approach is a first step towards the final goal of
achieving trained deep neural network models whose predictions, features in
hidden layers and the learned filters can be well explained.Such a model when
deployed in production can easily be modified to adapt to new data, whereas
existing deep learning models need a re training or fine tuning. This process
again needs a huge number of data samples that are not easy to generate unless
the model has good explainability.
Related papers
- Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model.
We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z) - DreamTeacher: Pretraining Image Backbones with Deep Generative Models [103.62397699392346]
We introduce a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones.
We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet.
We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board.
arXiv Detail & Related papers (2023-07-14T17:17:17Z) - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
NeRFs have become a popular choice for representing scenes due to their strong 3D prior.
For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Deep Neural Networks are Surprisingly Reversible: A Baseline for
Zero-Shot Inversion [90.65667807498086]
This paper presents a zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation.
We empirically show that modern classification models on ImageNet can, surprisingly, be inverted, allowing an approximate recovery of the original 224x224px images from a representation after more than 20 layers.
arXiv Detail & Related papers (2021-07-13T18:01:43Z) - An Interaction-based Convolutional Neural Network (ICNN) Towards Better
Understanding of COVID-19 X-ray Images [0.0]
We propose a novel Interaction-based Convolutional Neural Network (ICNN) that does not make assumptions about the relevance of local information.
We demonstrate that the proposed method produces state-of-the-art prediction performance of 99.8% on a real-world data set classifying COVID-19 Chest X-ray images.
arXiv Detail & Related papers (2021-06-13T04:41:17Z) - Embracing New Techniques in Deep Learning for Estimating Image
Memorability [0.0]
We propose and evaluate five alternative deep learning models to predict image memorability.
Our findings suggest that the key prior memorability network had overstated its generalizability and was overfit on its training set.
We make our new state-of-the-art model readily available to the research community, allowing memory researchers to make predictions about memorability on a wider range of images.
arXiv Detail & Related papers (2021-05-21T23:05:23Z) - Comparative evaluation of CNN architectures for Image Caption Generation [1.2183405753834562]
We have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks.
We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.
arXiv Detail & Related papers (2021-02-23T05:43:54Z) - Text-to-Image Generation with Attention Based Recurrent Neural Networks [1.2599533416395765]
We develop a tractable and stable caption-based image generation model.
Experimentations are performed on Microsoft datasets.
Results show that the proposed model performs better than contemporary approaches.
arXiv Detail & Related papers (2020-01-18T12:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.