Related papers: Tuning-less Object Naming with a Foundation Model

Tuning-less Object Naming with a Foundation Model

URL: http://arxiv.org/abs/2311.04924v2
Date: Mon, 26 Feb 2024 13:08:43 GMT
Title: Tuning-less Object Naming with a Foundation Model
Authors: Andrej Lucny, Pavel Petrovic
Abstract summary: We implement a real-time object naming system that enables learning a set of named entities never seen. Our contribution is using the association mechanism known from transformers as attention. As a result, the system can work in a one-shot manner and correctly name objects named in different contents.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We implement a real-time object naming system that enables learning a set of named entities never seen. Our approach employs an existing foundation model that we consider ready to see anything before starting. It turns seen images into relatively small feature vectors that we associate with index to a gradually built vocabulary without any training of fine-tuning of the model. Our contribution is using the association mechanism known from transformers as attention. It has features that support generalization from irrelevant information for distinguishing the entities and potentially enable associating with much more than indices to vocabulary. As a result, the system can work in a one-shot manner and correctly name objects named in different contents. We also outline implementation details of the system modules integrated by a blackboard architecture. Finally, we investigate the system's quality, mainly how many objects it can handle in this way.

Related papers

Everybody Likes to Sleep: A Computer-Assisted Comparison of Object Naming Data from 30 Languages [1.3351610617039973]
Object naming datasets are used to gain insights into how humans access and select names for objects in their surroundings. Our study tries to make current object naming data transparent and comparable by using a multilingual, computer-assisted approach. Our findings can serve as a basis for enhancing cross-linguistic object naming research.
arXiv Detail & Related papers (2025-01-14T18:50:00Z)
Bootstrapping Top-down Information for Self-modulating Slot Attention [29.82550058869251]
We propose a novel OCL framework incorporating a top-down pathway. This pathway bootstraps the semantics of individual objects and then modulates the model to prioritize features relevant to these semantics. Our framework achieves state-of-the-art performance across multiple synthetic and real-world object-discovery benchmarks.
arXiv Detail & Related papers (2024-11-04T05:00:49Z)
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities [29.716152560414738]
We enhance the Learned Sparse Retrieval (LSR) model with Wikipedia concepts and entities. In experiments across three entity-rich document ranking datasets, the resulting DyVo model substantially outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-10T08:41:34Z)
What's in a Name? Beyond Class Indices for Image Recognition [28.02490526407716]
We propose a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information. We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of candidate names. Our method leads to a roughly 50% improvement over the baseline on ImageNet in the unsupervised setting.
arXiv Detail & Related papers (2023-04-05T11:01:23Z)
Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection. Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning. We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z)
Learning Structured Representations of Entity Names using Active Learning and Weak Supervision [19.780301040411008]
In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem. Our experimental evaluation show that this framework enables the learning of high-quality models from merely a dozen or so labeled examples.
arXiv Detail & Related papers (2020-10-30T21:01:22Z)
AssembleNet++: Assembling Modality Representations via Attention Connections [83.50084190050093]
We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network. A new network component named peer-attention is introduced, which dynamically learns the attention weights using another block or input modality.
arXiv Detail & Related papers (2020-08-18T17:54:08Z)
Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems [135.10772866688404]
Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly. We address this issue via an architecture that factorizes declarative and procedural knowledge.
arXiv Detail & Related papers (2020-06-29T17:45:03Z)
Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities. A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations. We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z)
Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition. We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves. We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
Look-into-Object: Self-supervised Structure Modeling for Object Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions. We show the recognition backbone can be substantially enhanced for more robust representation learning. Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.