Tuning-less Object Naming with a Foundation Model
- URL: http://arxiv.org/abs/2311.04924v2
- Date: Mon, 26 Feb 2024 13:08:43 GMT
- Title: Tuning-less Object Naming with a Foundation Model
- Authors: Andrej Lucny, Pavel Petrovic
- Abstract summary: We implement a real-time object naming system that enables learning a set of named entities never seen.
Our contribution is using the association mechanism known from transformers as attention.
As a result, the system can work in a one-shot manner and correctly name objects named in different contents.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We implement a real-time object naming system that enables learning a set of
named entities never seen. Our approach employs an existing foundation model
that we consider ready to see anything before starting. It turns seen images
into relatively small feature vectors that we associate with index to a
gradually built vocabulary without any training of fine-tuning of the model.
Our contribution is using the association mechanism known from transformers as
attention. It has features that support generalization from irrelevant
information for distinguishing the entities and potentially enable associating
with much more than indices to vocabulary. As a result, the system can work in
a one-shot manner and correctly name objects named in different contents. We
also outline implementation details of the system modules integrated by a
blackboard architecture. Finally, we investigate the system's quality, mainly
how many objects it can handle in this way.
Related papers
- Hypernymization of named entity-rich captions for grounding-based
multi-modal pretraining [36.75629570208193]
We investigate hypernymization as a way to deal with named entities for pretraining grounding-based multi-modal models.
We report improved pretraining performance on objects of interest following hypernymization.
We show the promise of hypernymization on open-vocabulary detection, specifically on classes not seen during training.
arXiv Detail & Related papers (2023-04-25T20:17:40Z) - What's in a Name? Beyond Class Indices for Image Recognition [31.68225941659493]
We propose a vision-language model to assign class names to images given only a large and essentially unconstrained vocabulary of categories as prior information.
Specifically, we propose iteratively clustering the data and voting on class names within them, showing that this enables a roughly 50% improvement over the baseline on ImageNet.
arXiv Detail & Related papers (2023-04-05T11:01:23Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Learning Structured Representations of Entity Names using Active
Learning and Weak Supervision [19.780301040411008]
In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem.
Our experimental evaluation show that this framework enables the learning of high-quality models from merely a dozen or so labeled examples.
arXiv Detail & Related papers (2020-10-30T21:01:22Z) - AssembleNet++: Assembling Modality Representations via Attention
Connections [83.50084190050093]
We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.
A new network component named peer-attention is introduced, which dynamically learns the attention weights using another block or input modality.
arXiv Detail & Related papers (2020-08-18T17:54:08Z) - Object Files and Schemata: Factorizing Declarative and Procedural
Knowledge in Dynamical Systems [135.10772866688404]
Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly.
We address this issue via an architecture that factorizes declarative and procedural knowledge.
arXiv Detail & Related papers (2020-06-29T17:45:03Z) - Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations.
We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.