Leveraging Systematic Knowledge of 2D Transformations
- URL: http://arxiv.org/abs/2206.00893v2
- Date: Tue, 23 Apr 2024 03:23:10 GMT
- Title: Leveraging Systematic Knowledge of 2D Transformations
- Authors: Jiachen Kang, Wenjing Jia, Xiangjian He,
- Abstract summary: Humans have a remarkable ability to interpret images, even if the scenes in the images are rare.
This work focuses on 1) the acquisition of systematic knowledge of 2D transformations, and 2) architectural components that can leverage the learned knowledge in image classification tasks.
- Score: 6.668181653599057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The existing deep learning models suffer from out-of-distribution (o.o.d.) performance drop in computer vision tasks. In comparison, humans have a remarkable ability to interpret images, even if the scenes in the images are rare, thanks to the systematicity of acquired knowledge. This work focuses on 1) the acquisition of systematic knowledge of 2D transformations, and 2) architectural components that can leverage the learned knowledge in image classification tasks in an o.o.d. setting. With a new training methodology based on synthetic datasets that are constructed under the causal framework, the deep neural networks acquire knowledge from semantically different domains (e.g. even from noise), and exhibit certain level of systematicity in parameter estimation experiments. Based on this, a novel architecture is devised consisting of a classifier, an estimator and an identifier (abbreviated as "CED"). By emulating the "hypothesis-verification" process in human visual perception, CED improves the classification accuracy significantly on test sets under covariate shift.
Related papers
- Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Dual Cognitive Architecture: Incorporating Biases and Multi-Memory
Systems for Lifelong Learning [21.163070161951868]
We introduce Dual Cognitive Architecture (DUCA), which includes multiple sub-systems, implicit and explicit knowledge representation, inductive bias, and a multi-memory system.
DUCA shows improvement across different settings and datasets, and it also exhibits reduced task recency bias, without the need for extra information.
To further test the versatility of lifelong learning methods on a challenging distribution shift, we introduce a novel domain-incremental dataset DN4IL.
arXiv Detail & Related papers (2023-10-17T15:24:02Z) - Defect Classification in Additive Manufacturing Using CNN-Based Vision
Processing [76.72662577101988]
This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model.
This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
arXiv Detail & Related papers (2023-07-14T14:36:58Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - CIFAKE: Image Classification and Explainable Identification of
AI-Generated Synthetic Images [7.868449549351487]
This article proposes to enhance our ability to recognise AI-generated images through computer vision.
The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI.
This study proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake.
arXiv Detail & Related papers (2023-03-24T16:33:06Z) - Top-down inference in an early visual cortex inspired hierarchical
Variational Autoencoder [0.0]
We exploit advances in Variational Autoencoders to investigate the early visual cortex with sparse coding hierarchical VAEs trained on natural images.
We show that representations similar to the one found in the primary and secondary visual cortices naturally emerge under mild inductive biases.
We show that a neuroscience-inspired choice of the recognition model is critical for two signatures of computations with generative models.
arXiv Detail & Related papers (2022-06-01T12:21:58Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Lifelong 3D Object Recognition and Grasp Synthesis Using Dual Memory
Recurrent Self-Organization Networks [0.0]
Humans learn to recognize and manipulate new objects in lifelong settings without forgetting the previously gained knowledge.
In most conventional deep neural networks, this is not possible due to the problem of catastrophic forgetting.
We propose a hybrid model architecture consisting of a dual-memory recurrent neural network and an autoencoder to tackle object recognition and grasping simultaneously.
arXiv Detail & Related papers (2021-09-23T11:14:13Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge
into Deep Neural Networks [11.622060073764944]
We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks.
DASL incorporates user-provided formal knowledge to improve learning from data.
We evaluate DASL on a visual relationship detection task and demonstrate that the addition of commonsense knowledge improves performance by $10.7%$ in a data scarce setting.
arXiv Detail & Related papers (2020-03-16T17:37:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.