Visual-Semantic Embedding Model Informed by Structured Knowledge
- URL: http://arxiv.org/abs/2009.10026v1
- Date: Mon, 21 Sep 2020 17:04:32 GMT
- Title: Visual-Semantic Embedding Model Informed by Structured Knowledge
- Authors: Mirantha Jayathilaka, Tingting Mu, Uli Sattler
- Abstract summary: We propose a novel approach to improve a visual-semantic embedding model by incorporating concept representations captured from an external structured knowledge base.
We investigate its performance on image classification under both standard and zero-shot settings.
- Score: 3.2734466030053175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel approach to improve a visual-semantic embedding model by
incorporating concept representations captured from an external structured
knowledge base. We investigate its performance on image classification under
both standard and zero-shot settings. We propose two novel evaluation
frameworks to analyse classification errors with respect to the class hierarchy
indicated by the knowledge base. The approach is tested using the ILSVRC 2012
image dataset and a WordNet knowledge base. With respect to both standard and
zero-shot image classification, our approach shows superior performance
compared with the original approach, which uses word embeddings.
Related papers
- Contextuality Helps Representation Learning for Generalized Category Discovery [5.885208652383516]
This paper introduces a novel approach to Generalized Category Discovery (GCD) by leveraging the concept of contextuality.
Our model integrates two levels of contextuality: instance-level, where nearest-neighbor contexts are utilized for contrastive learning, and cluster-level, employing contrastive learning.
The integration of the contextual information effectively improves the feature learning and thereby the classification accuracy of all categories.
arXiv Detail & Related papers (2024-07-29T07:30:41Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system.
Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context.
Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Progressive Tree-Structured Prototype Network for End-to-End Image
Captioning [74.8547752611337]
We propose a novel Progressive Tree-Structured prototype Network (dubbed PTSN)
PTSN is the first attempt to narrow down the scope of prediction words with appropriate semantics by modeling the hierarchical textual semantics.
Our method achieves a new state-of-the-art performance with 144.2% (single model) and 146.5% (ensemble of 4 models) CIDEr scores on Karpathy' split and 141.4% (c5) and 143.9% (c40) CIDEr scores on the official online test server.
arXiv Detail & Related papers (2022-11-17T11:04:00Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Ontology-based n-ball Concept Embeddings Informing Few-shot Image
Classification [5.247029505708008]
ViOCE integrates symbolic knowledge in the form of $n$-ball concept embeddings into a neural network based vision architecture.
We evaluate ViOCE using the task of few-shot image classification, where it demonstrates superior performance on two standard benchmarks.
arXiv Detail & Related papers (2021-09-19T05:35:43Z) - Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples.
We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model.
Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.