ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
- URL: http://arxiv.org/abs/2506.08968v1
- Date: Tue, 10 Jun 2025 16:41:33 GMT
- Title: ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
- Authors: Amirreza Rouhi, Solmaz Arezoomandan, Knut Peterson, Joseph T. Woods, David K. Han,
- Abstract summary: We introduce ADAM: Autonomous Discovery and Model, a training-free, self-refining framework for open-world object labeling.<n> ADAM generates candidate labels for unknown objects based on contextual information from known entities within a scene.<n> ADAM retrieves visually similar instances from an Embedding-Label Repository and applies frequency-based voting and cross-modal re-ranking to assign a robust label.
- Score: 7.0524023948087375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.
Related papers
- LLM-Guided Agentic Object Detection for Open-World Understanding [45.08126325125808]
Object detection traditionally relies on fixed category sets, requiring costly re-training to handle novel objects.<n>We propose an LLM-guided agentic object detection framework that enables fully label-free, zero-shot detection.<n>Our method offers enhanced autonomy and adaptability for open-world understanding.
arXiv Detail & Related papers (2025-07-14T22:30:48Z) - Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation [14.336117107170153]
Zero-shot object navigation (ZSON) addresses situation where an agent navigates to an unseen object that does not present in the training set.
We introduce seen objects without labels into training procedure to enrich the agent's knowledge base with distinguishable but previously overlooked information.
arXiv Detail & Related papers (2024-05-24T05:26:18Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Towards Open-Domain Topic Classification [69.21234350688098]
We introduce an open-domain topic classification system that accepts user-defined taxonomy in real time.
Users will be able to classify a text snippet with respect to any candidate labels they want, and get instant response from our web interface.
arXiv Detail & Related papers (2023-06-29T20:25:28Z) - Open-World Weakly-Supervised Object Localization [26.531408294517416]
We introduce a new weakly-supervised object localization task called OWSOL (Open-World Weakly-Supervised Object localization)
We propose a novel paradigm of contrastive representation co-learning using both labeled and unlabeled data to generate a complete G-CAM for object localization.
We re-organize two widely used datasets, i.e., ImageNet-1K and iNatLoc500, and propose OpenImages150 to serve as evaluation benchmarks for OWSOL.
arXiv Detail & Related papers (2023-04-17T13:31:59Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Exploiting Unlabeled Data with Vision and Language Models for Object
Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets.
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images.
We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - Towards Few-shot Entity Recognition in Document Images: A Label-aware
Sequence-to-Sequence Framework [28.898240725099782]
We build an entity recognition model requiring only a few shots of annotated document images.
We develop a novel label-aware seq2seq framework, LASER.
Experiments on two benchmark datasets demonstrate the superiority of LASER under the few-shot setting.
arXiv Detail & Related papers (2022-03-30T18:30:42Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.