Context-based Deep Learning Architecture with Optimal Integration Layer
for Image Parsing
- URL: http://arxiv.org/abs/2204.06214v1
- Date: Wed, 13 Apr 2022 07:35:39 GMT
- Title: Context-based Deep Learning Architecture with Optimal Integration Layer
for Image Parsing
- Authors: Ranju Mandal, Basim Azam, and Brijesh Verma
- Abstract summary: The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information.
The experimental outcomes when evaluated on benchmark datasets are promising.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models have been efficient lately on image parsing tasks.
However, deep learning models are not fully capable of exploiting visual and
contextual information simultaneously. The proposed three-layer context-based
deep architecture is capable of integrating context explicitly with visual
information. The novel idea here is to have a visual layer to learn visual
characteristics from binary class-based learners, a contextual layer to learn
context, and then an integration layer to learn from both via genetic
algorithm-based optimal fusion to produce a final decision. The experimental
outcomes when evaluated on benchmark datasets are promising. Further analysis
shows that optimized network weights can improve performance and make stable
predictions.
Related papers
- Advanced Multimodal Deep Learning Architecture for Image-Text Matching [33.8315200009152]
Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship.
We introduce an advanced multimodal deep learning architecture, which combines the high-level abstract representation ability of deep neural networks for visual information with the advantages of natural language processing models for text semantic understanding.
Experiments show that compared with existing image-text matching models, the optimized new model has significantly improved performance on a series of benchmark data sets.
arXiv Detail & Related papers (2024-06-13T08:32:24Z) - SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images [4.269350826756809]
This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera.
The proposed approach is based on a shared encoder-decoder architecture, which integrates various techniques to improve the accuracy of the depth estimation and semantic segmentation task without compromising computational efficiency.
The framework is thoroughly evaluated on two datasets - the outdoor Cityscapes dataset and the indoor NYU Depth V2 dataset - and it outperforms existing state-of-the-art methods in both segmentation and depth estimation tasks.
arXiv Detail & Related papers (2024-03-15T20:04:27Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Deep Learning Model with GA based Feature Selection and Context
Integration [2.3472688456025756]
We propose a novel three-layered deep learning model that assiminlates or learns independently global and local contextual information alongside visual features.
The novelty of the proposed model is that One-vs-All binary class-based learners are introduced to learn Genetic Algorithm (GA) optimized features in the visual layer.
optimized visual features with global and local contextual information play a significant role to improve accuracy and produce stable predictions comparable to state-of-the-art deep CNN models.
arXiv Detail & Related papers (2022-04-13T06:28:41Z) - DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [91.56988987393483]
We present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models.
Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones.
arXiv Detail & Related papers (2021-12-02T18:59:32Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Exploiting Contextual Information with Deep Neural Networks [5.787117733071416]
We show that contextual information can be exploited in 2 fundamentally different ways: implicitly and explicitly.
In this thesis, we show that contextual information can be exploited in 2 fundamentally different ways: implicitly and explicitly.
arXiv Detail & Related papers (2020-06-21T03:40:30Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.