Related papers: Towards Learning a Vocabulary of Visual Concepts and Operators using Deep Neural Networks

Towards Learning a Vocabulary of Visual Concepts and Operators using Deep Neural Networks

URL: http://arxiv.org/abs/2109.00479v1
Date: Wed, 1 Sep 2021 16:34:57 GMT
Title: Towards Learning a Vocabulary of Visual Concepts and Operators using Deep Neural Networks
Authors: Sunil Kumar Vengalil and Neelam Sinha
Abstract summary: We analyze the learned feature maps of trained models using MNIST images for achieving more explainable predictions. We illustrate the idea by generating visual concepts from a Variational Autoencoder trained using MNIST images. We were able to reduce the reconstruction loss (mean square error) from an initial value of 120 without augmentation to 60 with augmentation.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep neural networks have become the default choice for many applications like image and video recognition, segmentation and other image and video related tasks.However, a critical challenge with these models is the lack of explainability.This requirement of generating explainable predictions has motivated the research community to perform various analysis on trained models.In this study, we analyze the learned feature maps of trained models using MNIST images for achieving more explainable predictions.Our study is focused on deriving a set of primitive elements, here called visual concepts, that can be used to generate any arbitrary sample from the data generating distribution.We derive the primitive elements from the feature maps learned by the model.We illustrate the idea by generating visual concepts from a Variational Autoencoder trained using MNIST images.We augment the training data of MNIST dataset by adding about 60,000 new images generated with visual concepts chosen at random.With this we were able to reduce the reconstruction loss (mean square error) from an initial value of 120 without augmentation to 60 with augmentation.Our approach is a first step towards the final goal of achieving trained deep neural network models whose predictions, features in hidden layers and the learned filters can be well explained.Such a model when deployed in production can easily be modified to adapt to new data, whereas existing deep learning models need a re training or fine tuning. This process again needs a huge number of data samples that are not easy to generate unless the model has good explainability.

Related papers

Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis [28.198808978074126]
We present Marigold, a family of conditional generative models and a fine-tuning protocol that extracts the knowledge from pretrained latent diffusion models.<n>Marigold requires minimal modification of the pre-trained latent diffusion model's architecture, trains with small synthetic datasets on a single GPU over a few days, and demonstrates state-of-the-art zero-shot generalization.
arXiv Detail & Related papers (2025-05-14T13:07:03Z)
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model. We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z)
DreamTeacher: Pretraining Image Backbones with Deep Generative Models [103.62397699392346]
We introduce a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board.
arXiv Detail & Related papers (2023-07-14T17:17:17Z)
Feature visualization for convolutional neural network models trained on neuroimaging data [0.0]
We show for the first time results using feature visualization of convolutional neural networks (CNNs) We have trained CNNs for different tasks including sex classification and artificial lesion classification based on structural magnetic resonance imaging (MRI) data. The resulting images reveal the learned concepts of the artificial lesions, including their shapes, but remain hard to interpret for abstract features in the sex classification task.
arXiv Detail & Related papers (2022-03-24T15:24:38Z)
Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks. NeRFs have become a popular choice for representing scenes due to their strong 3D prior. For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z)
Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image. We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively. Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z)
Deep Neural Networks are Surprisingly Reversible: A Baseline for Zero-Shot Inversion [90.65667807498086]
This paper presents a zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation. We empirically show that modern classification models on ImageNet can, surprisingly, be inverted, allowing an approximate recovery of the original 224x224px images from a representation after more than 20 layers.
arXiv Detail & Related papers (2021-07-13T18:01:43Z)
An Interaction-based Convolutional Neural Network (ICNN) Towards Better Understanding of COVID-19 X-ray Images [0.0]
We propose a novel Interaction-based Convolutional Neural Network (ICNN) that does not make assumptions about the relevance of local information. We demonstrate that the proposed method produces state-of-the-art prediction performance of 99.8% on a real-world data set classifying COVID-19 Chest X-ray images.
arXiv Detail & Related papers (2021-06-13T04:41:17Z)
Embracing New Techniques in Deep Learning for Estimating Image Memorability [0.0]
We propose and evaluate five alternative deep learning models to predict image memorability. Our findings suggest that the key prior memorability network had overstated its generalizability and was overfit on its training set. We make our new state-of-the-art model readily available to the research community, allowing memory researchers to make predictions about memorability on a wider range of images.
arXiv Detail & Related papers (2021-05-21T23:05:23Z)
Comparative evaluation of CNN architectures for Image Caption Generation [1.2183405753834562]
We have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks. We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.
arXiv Detail & Related papers (2021-02-23T05:43:54Z)
Text-to-Image Generation with Attention Based Recurrent Neural Networks [1.2599533416395765]
We develop a tractable and stable caption-based image generation model. Experimentations are performed on Microsoft datasets. Results show that the proposed model performs better than contemporary approaches.
arXiv Detail & Related papers (2020-01-18T12:19:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.