Related papers: Contrastive Proposal Extension with LSTM Network for Weakly Supervised Object Detection

Contrastive Proposal Extension with LSTM Network for Weakly Supervised Object Detection

URL: http://arxiv.org/abs/2110.07511v2
Date: Sat, 16 Oct 2021 12:17:18 GMT
Title: Contrastive Proposal Extension with LSTM Network for Weakly Supervised Object Detection
Authors: Pei Lv, Suqi Hu, Tianran Hao, Haohan Ji, Lisha Cui, Haoyi Fan, Mingliang Xu and Changsheng Xu
Abstract summary: Weakly supervised object detection (WSOD) has attracted more and more attention since it only uses image-level labels and can save huge annotation costs. We propose a new method by comparing the initial proposals and the extension ones to optimize those initial proposals. Experiments on PASCAL VOC 2007, VOC 2012 and MS-COCO datasets show that our method has achieved the state-of-the-art results.
Score: 52.86681130880647
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Weakly supervised object detection (WSOD) has attracted more and more attention since it only uses image-level labels and can save huge annotation costs. Most of the WSOD methods use Multiple Instance Learning (MIL) as their basic framework, which regard it as an instance classification problem. However, these methods based on MIL tends to converge only on the most discriminate regions of different instances, rather than their corresponding complete regions, that is, insufficient integrity. Inspired by the habit of observing things by the human, we propose a new method by comparing the initial proposals and the extension ones to optimize those initial proposals. Specifically, we propose one new strategy for WSOD by involving contrastive proposal extension (CPE), which consists of multiple directional contrastive proposal extensions (D-CPE), and each D-CPE contains encoders based on LSTM network and corresponding decoders. Firstly, the boundary of initial proposals in MIL is extended to different positions according to well-designed sequential order. Then, CPE compares the extended proposal and the initial proposal by extracting the feature semantics of them using the encoders, and calculates the integrity of the initial proposal to optimize the score of the initial proposal. These contrastive contextual semantics will guide the basic WSOD to suppress bad proposals and improve the scores of good ones. In addition, a simple two-stream network is designed as the decoder to constrain the temporal coding of LSTM and improve the performance of WSOD further. Experiments on PASCAL VOC 2007, VOC 2012 and MS-COCO datasets show that our method has achieved the state-of-the-art results.

Related papers

Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization [10.47505806629852]
Large language models (LLMs) are adept at discerning profound user interests from historical behaviors. We propose a novel framework that harmoniously merges traditional recommendation models with the prowess of LLMs. We design a series of specialized supervised learning tasks aimed at aligning collaborative signals with the subtleties of natural language semantics.
arXiv Detail & Related papers (2024-12-18T12:07:58Z)
Towards Scalable Semantic Representation for Recommendation [65.06144407288127]
Mixture-of-Codes is proposed to construct semantic IDs based on large language models (LLMs) Our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
arXiv Detail & Related papers (2024-10-12T15:10:56Z)
Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator [60.07198935747619]
We propose Twin-Tower Dynamic Semantic Recommender (T TDS), the first generative RS which adopts dynamic semantic index paradigm. To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender. The proposed T TDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
arXiv Detail & Related papers (2024-09-14T01:45:04Z)
Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z)
CPN: Complementary Proposal Network for Unconstrained Text Detection [7.524080426954018]
We propose a Complementary Proposal Network that seamlessly integrates semantic and geometric information for superior performance. By leveraging both complementary proposals and features, CPN outperforms state-of-the-art approaches with significant margins under comparable cost.
arXiv Detail & Related papers (2024-02-18T10:43:53Z)
Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders. It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z)
UniInst: Unique Representation for End-to-End Instance Segmentation [29.974973664317485]
We propose a box-free and NMS-free end-to-end instance segmentation framework, termed UniInst. Specifically, we design an instance-aware one-to-one assignment scheme, which dynamically assigns one unique representation to each instance. With these techniques, our UniInst, the first FCN-based end-to-end instance segmentation framework, achieves competitive performance.
arXiv Detail & Related papers (2022-05-25T10:40:26Z)
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues [49.88590455664064]
ProposalCLIP is able to predict proposals for a large variety of object categories without annotations. ProposalCLIP also shows benefits for downstream tasks, such as unsupervised object detection.
arXiv Detail & Related papers (2022-01-18T01:51:35Z)
Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos [58.83440885457272]
We address the problem of temporal sentence localization in videos (TSLV) Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals. We propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency.
arXiv Detail & Related papers (2021-09-14T02:02:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.