Contrastive Proposal Extension with LSTM Network for Weakly Supervised
Object Detection
- URL: http://arxiv.org/abs/2110.07511v2
- Date: Sat, 16 Oct 2021 12:17:18 GMT
- Title: Contrastive Proposal Extension with LSTM Network for Weakly Supervised
Object Detection
- Authors: Pei Lv, Suqi Hu, Tianran Hao, Haohan Ji, Lisha Cui, Haoyi Fan,
Mingliang Xu and Changsheng Xu
- Abstract summary: Weakly supervised object detection (WSOD) has attracted more and more attention since it only uses image-level labels and can save huge annotation costs.
We propose a new method by comparing the initial proposals and the extension ones to optimize those initial proposals.
Experiments on PASCAL VOC 2007, VOC 2012 and MS-COCO datasets show that our method has achieved the state-of-the-art results.
- Score: 52.86681130880647
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Weakly supervised object detection (WSOD) has attracted more and more
attention since it only uses image-level labels and can save huge annotation
costs. Most of the WSOD methods use Multiple Instance Learning (MIL) as their
basic framework, which regard it as an instance classification problem.
However, these methods based on MIL tends to converge only on the most
discriminate regions of different instances, rather than their corresponding
complete regions, that is, insufficient integrity. Inspired by the habit of
observing things by the human, we propose a new method by comparing the initial
proposals and the extension ones to optimize those initial proposals.
Specifically, we propose one new strategy for WSOD by involving contrastive
proposal extension (CPE), which consists of multiple directional contrastive
proposal extensions (D-CPE), and each D-CPE contains encoders based on LSTM
network and corresponding decoders. Firstly, the boundary of initial proposals
in MIL is extended to different positions according to well-designed sequential
order. Then, CPE compares the extended proposal and the initial proposal by
extracting the feature semantics of them using the encoders, and calculates the
integrity of the initial proposal to optimize the score of the initial
proposal. These contrastive contextual semantics will guide the basic WSOD to
suppress bad proposals and improve the scores of good ones. In addition, a
simple two-stream network is designed as the decoder to constrain the temporal
coding of LSTM and improve the performance of WSOD further. Experiments on
PASCAL VOC 2007, VOC 2012 and MS-COCO datasets show that our method has
achieved the state-of-the-art results.
Related papers
- Towards Scalable Semantic Representation for Recommendation [65.06144407288127]
Mixture-of-Codes is proposed to construct semantic IDs based on large language models (LLMs)
Our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
arXiv Detail & Related papers (2024-10-12T15:10:56Z) - Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator [60.07198935747619]
We propose Twin-Tower Dynamic Semantic Recommender (T TDS), the first generative RS which adopts dynamic semantic index paradigm.
To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender.
The proposed T TDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
arXiv Detail & Related papers (2024-09-14T01:45:04Z) - CPN: Complementary Proposal Network for Unconstrained Text Detection [7.524080426954018]
We propose a Complementary Proposal Network that seamlessly integrates semantic and geometric information for superior performance.
By leveraging both complementary proposals and features, CPN outperforms state-of-the-art approaches with significant margins under comparable cost.
arXiv Detail & Related papers (2024-02-18T10:43:53Z) - Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders.
It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives.
For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z) - UniInst: Unique Representation for End-to-End Instance Segmentation [29.974973664317485]
We propose a box-free and NMS-free end-to-end instance segmentation framework, termed UniInst.
Specifically, we design an instance-aware one-to-one assignment scheme, which dynamically assigns one unique representation to each instance.
With these techniques, our UniInst, the first FCN-based end-to-end instance segmentation framework, achieves competitive performance.
arXiv Detail & Related papers (2022-05-25T10:40:26Z) - ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via
Exploiting CLIP Cues [49.88590455664064]
ProposalCLIP is able to predict proposals for a large variety of object categories without annotations.
ProposalCLIP also shows benefits for downstream tasks, such as unsupervised object detection.
arXiv Detail & Related papers (2022-01-18T01:51:35Z) - Adaptive Proposal Generation Network for Temporal Sentence Localization
in Videos [58.83440885457272]
We address the problem of temporal sentence localization in videos (TSLV)
Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals.
We propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency.
arXiv Detail & Related papers (2021-09-14T02:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.