Tuning-less Object Naming with a Foundation Model
- URL: http://arxiv.org/abs/2311.04924v2
- Date: Mon, 26 Feb 2024 13:08:43 GMT
- Title: Tuning-less Object Naming with a Foundation Model
- Authors: Andrej Lucny, Pavel Petrovic
- Abstract summary: We implement a real-time object naming system that enables learning a set of named entities never seen.
Our contribution is using the association mechanism known from transformers as attention.
As a result, the system can work in a one-shot manner and correctly name objects named in different contents.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We implement a real-time object naming system that enables learning a set of
named entities never seen. Our approach employs an existing foundation model
that we consider ready to see anything before starting. It turns seen images
into relatively small feature vectors that we associate with index to a
gradually built vocabulary without any training of fine-tuning of the model.
Our contribution is using the association mechanism known from transformers as
attention. It has features that support generalization from irrelevant
information for distinguishing the entities and potentially enable associating
with much more than indices to vocabulary. As a result, the system can work in
a one-shot manner and correctly name objects named in different contents. We
also outline implementation details of the system modules integrated by a
blackboard architecture. Finally, we investigate the system's quality, mainly
how many objects it can handle in this way.
Related papers
- Bootstrapping Top-down Information for Self-modulating Slot Attention [29.82550058869251]
We propose a novel OCL framework incorporating a top-down pathway.
This pathway bootstraps the semantics of individual objects and then modulates the model to prioritize features relevant to these semantics.
Our framework achieves state-of-the-art performance across multiple synthetic and real-world object-discovery benchmarks.
arXiv Detail & Related papers (2024-11-04T05:00:49Z) - What's in a Name? Beyond Class Indices for Image Recognition [28.02490526407716]
We propose a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information.
We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of candidate names.
Our method leads to a roughly 50% improvement over the baseline on ImageNet in the unsupervised setting.
arXiv Detail & Related papers (2023-04-05T11:01:23Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Learning Structured Representations of Entity Names using Active
Learning and Weak Supervision [19.780301040411008]
In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem.
Our experimental evaluation show that this framework enables the learning of high-quality models from merely a dozen or so labeled examples.
arXiv Detail & Related papers (2020-10-30T21:01:22Z) - AssembleNet++: Assembling Modality Representations via Attention
Connections [83.50084190050093]
We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.
A new network component named peer-attention is introduced, which dynamically learns the attention weights using another block or input modality.
arXiv Detail & Related papers (2020-08-18T17:54:08Z) - Object Files and Schemata: Factorizing Declarative and Procedural
Knowledge in Dynamical Systems [135.10772866688404]
Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly.
We address this issue via an architecture that factorizes declarative and procedural knowledge.
arXiv Detail & Related papers (2020-06-29T17:45:03Z) - Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations.
We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.