MatchDet: A Collaborative Framework for Image Matching and Object Detection
- URL: http://arxiv.org/abs/2312.10983v3
- Date: Wed, 17 Jul 2024 04:03:31 GMT
- Title: MatchDet: A Collaborative Framework for Image Matching and Object Detection
- Authors: Jinxiang Lai, Wenlong Wu, Bin-Bin Gao, Jun Liu, Jiawei Zhan, Congchong Nie, Yi Zeng, Chengjie Wang,
- Abstract summary: We propose a collaborative framework called MatchDet for image matching and object detection.
To achieve the collaborative learning of the two tasks, we propose three novel modules.
We evaluate the approaches on a new benchmark with two datasets called Warp-COCO and miniScanNet.
- Score: 33.09209198536698
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image matching and object detection are two fundamental and challenging tasks, while many related applications consider them two individual tasks (i.e. task-individual). In this paper, a collaborative framework called MatchDet (i.e. task-collaborative) is proposed for image matching and object detection to obtain mutual improvements. To achieve the collaborative learning of the two tasks, we propose three novel modules, including a Weighted Spatial Attention Module (WSAM) for Detector, and Weighted Attention Module (WAM) and Box Filter for Matcher. Specifically, the WSAM highlights the foreground regions of target image to benefit the subsequent detector, the WAM enhances the connection between the foreground regions of pair images to ensure high-quality matches, and Box Filter mitigates the impact of false matches. We evaluate the approaches on a new benchmark with two datasets called Warp-COCO and miniScanNet. Experimental results show our approaches are effective and achieve competitive improvements.
Related papers
- Transformer based Multitask Learning for Image Captioning and Object
Detection [13.340784876489927]
This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model.
We propose TICOD, Transformer-based Image Captioning and Object detection model for jointly training both tasks.
Our model outperforms the baselines from image captioning literature by achieving a 3.65% improvement in BERTScore.
arXiv Detail & Related papers (2024-03-10T19:31:13Z) - Dense Affinity Matching for Few-Shot Segmentation [83.65203917246745]
Few-Shot (FSS) aims to segment the novel class images with a few samples.
We propose a dense affinity matching framework to exploit the support-query interaction.
We show that our framework performs very competitively under different settings with only 0.68M parameters.
arXiv Detail & Related papers (2023-07-17T12:27:15Z) - Few-Shot Object Detection with Fully Cross-Transformer [35.49840687007507]
Few-shot object detection (FSOD) aims to detect novel objects using very few training examples.
We propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head.
Our model can improve the few-shot similarity learning between the two branches by introducing the multi-level interactions.
arXiv Detail & Related papers (2022-03-28T18:28:51Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - RelationRS: Relationship Representation Network for Object Detection in
Aerial Images [15.269897893563417]
We propose a relationship representation network for object detection in aerial images (RelationRS)
The dual relationship module learns the potential relationship between features of different scales and learns the relationship between different scenes from different patches in a same iteration.
The bridging visual representations module (BVR) is introduced into the field of aerial images to improve the object detection effect in images with complex backgrounds.
arXiv Detail & Related papers (2021-10-13T14:02:33Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision.
We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes.
With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z) - Co-Attention for Conditioned Image Matching [91.43244337264454]
We propose a new approach to determine correspondences between image pairs in the wild under large changes in illumination, viewpoint, context, and material.
While other approaches find correspondences between pairs of images by treating the images independently, we instead condition on both images to implicitly take account of the differences between them.
arXiv Detail & Related papers (2020-07-16T17:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.