Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
- URL: http://arxiv.org/abs/2405.17859v1
- Date: Tue, 28 May 2024 06:16:57 GMT
- Title: Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
- Authors: Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, Yu Xiang,
- Abstract summary: We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment.
Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets.
- Score: 15.414518995812754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize the Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the top RGB methods by 3.6 AP and remains competitive with the best RGB-D method. Code is available at: https://github.com/YoungSean/NIDS-Net
Related papers
- Object-Aware Video Matting with Cross-Frame Guidance [35.785998735049006]
We present a trimap-free Object-Aware Video Matting (OAVM) framework, which can perceive different objects, enabling joint recognition of foreground objects and refinement of edge details.
Specifically, we propose an Object-Guided Correction and Refinement (OGCR) module, which employs cross-frame guidance to aggregate object-level instance information into pixel-level detail features.
We also design a Sequential Foreground Merging augmentation strategy to diversify sequential scenarios and enhance capacity of the network for object discrimination.
arXiv Detail & Related papers (2025-03-03T07:40:32Z) - UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation [38.331860053615955]
This paper introduces a novel framework for unified incremental few-shot object detection (iFSOD) and instance segmentation (iFSIS) using the Transformer architecture.
Our goal is to create an optimal solution for situations where only a few examples of novel object classes are available.
arXiv Detail & Related papers (2024-11-13T12:29:44Z) - Weakly Supervised Open-Vocabulary Object Detection [31.605276665964787]
We propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD.
To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment.
arXiv Detail & Related papers (2023-12-19T18:59:53Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Transformer-based Multi-Instance Learning for Weakly Supervised Object
Detection [43.481591776038144]
Weakly Supervised Object Detection (WSOD) enables the training of object detection models using only image-level annotations.
We propose a novel backbone for WSOD based on our tailored Vision Transformer named Weakly Supervised Transformer Detection Network (WSTDN)
arXiv Detail & Related papers (2023-03-27T08:42:45Z) - Revisiting Color-Event based Tracking: A Unified Network, Dataset, and
Metric [53.88188265943762]
We propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.
Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance.
arXiv Detail & Related papers (2022-11-20T16:01:31Z) - Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection [37.48935478836176]
Prototypical VoteNet is a few-shot 3D point cloud object detection approach.
It incorporates two new modules: Prototypical Vote Module (PVM) and Prototypical Head Module (PHM)
arXiv Detail & Related papers (2022-10-11T16:25:38Z) - Collaborative Propagation on Multiple Instance Graphs for 3D Instance
Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point.
With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information.
Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z) - PointInst3D: Segmenting 3D Instances by Points [136.7261709896713]
We propose a fully-convolutional 3D point cloud instance segmentation method that works in a per-point prediction fashion.
We find the key to its success is assigning a suitable target to each sampled point.
Our approach achieves promising results on both ScanNet and S3DIS benchmarks.
arXiv Detail & Related papers (2022-04-25T02:41:46Z) - Sparse Instance Activation for Real-Time Instance Segmentation [72.23597664935684]
We propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.
SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on the COCO benchmark.
arXiv Detail & Related papers (2022-03-24T03:15:39Z) - Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit
Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process.
We introduce two explicit inferences into the localization process to reduce its dependence on annotated data.
It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z) - Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with
Attentive Feature Alignment [33.446875089255876]
Few-shot object detection (FSOD) aims to detect objects using only few examples.
We propose a meta-learning based few-shot object detection method by transferring meta-knowledge learned from data-abundant base classes to data-scarce novel classes.
arXiv Detail & Related papers (2021-04-15T19:01:27Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage
Object Detectors [64.93963042395976]
Implicit Instance-Invariant Network (I3Net) is tailored for adapting one-stage detectors.
I3Net implicitly learns instance-invariant features via exploiting the natural characteristics of deep features in different layers.
Experiments reveal that I3Net exceeds the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2021-03-25T11:14:36Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z) - 1st Place Solutions for OpenImage2019 -- Object Detection and Instance
Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019.
It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression.
We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.