Boosting Weak Positives for Text Based Person Search
- URL: http://arxiv.org/abs/2501.17586v2
- Date: Thu, 30 Jan 2025 10:37:04 GMT
- Title: Boosting Weak Positives for Text Based Person Search
- Authors: Akshay Modi, Ashhar Aziz, Nilanjana Chatterjee, A V Subramanyam,
- Abstract summary: We introduce a boosting technique that dynamically identifies and emphasizes challenging samples during training.
Our method achieves improved performance across four pedestrian datasets, demonstrating the effectiveness of our proposed module.
- Score: 0.0
- License:
- Abstract: Large vision-language models have revolutionized cross-modal object retrieval, but text-based person search (TBPS) remains a challenging task due to limited data and fine-grained nature of the task. Existing methods primarily focus on aligning image-text pairs into a common representation space, often disregarding the fact that real world positive image-text pairs share a varied degree of similarity in between them. This leads models to prioritize easy pairs, and in some recent approaches, challenging samples are discarded as noise during training. In this work, we introduce a boosting technique that dynamically identifies and emphasizes these challenging samples during training. Our approach is motivated from classical boosting technique and dynamically updates the weights of the weak positives, wherein, the rank-1 match does not share the identity of the query. The weight allows these misranked pairs to contribute more towards the loss and the network has to pay more attention towards such samples. Our method achieves improved performance across four pedestrian datasets, demonstrating the effectiveness of our proposed module.
Related papers
- ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model [19.915033191502328]
This work aims to develop a retinal foundation model, called ViLReF, by pre-training on a paired dataset comprising 451,956 retinal images and corresponding diagnostic text reports.
In our vision-language pre-training strategy, we leverage expert knowledge to facilitate the extraction of labels.
We employ a batch expansion module with dynamic memory queues, maintained by momentum encoders, to supply extra samples and compensate for the vacancies caused by eliminating false negatives.
arXiv Detail & Related papers (2024-08-20T14:27:03Z) - Boosting Unconstrained Face Recognition with Targeted Style Adversary [10.428185253933004]
We present a simple yet effective method to expand the training data by interpolating between instance-level feature statistics across labeled and unlabeled sets.
Our method, dubbed Targeted Style Adversary (TSA), is motivated by two observations: (i) the input domain is reflected in feature statistics, and (ii) face recognition model performance is influenced by style information.
arXiv Detail & Related papers (2024-08-14T16:13:03Z) - Curriculum Direct Preference Optimization for Diffusion and Consistency Models [110.08057135882356]
We propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation.
Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks.
arXiv Detail & Related papers (2024-05-22T13:36:48Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Robust Task-Oriented Dialogue Generation with Contrastive Pre-training
and Adversarial Filtering [17.7709632238066]
Data artifacts incentivize machine learning models to learn non-transferable generalizations.
We investigate whether popular datasets such as MultiWOZ contain such data artifacts.
We propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns.
arXiv Detail & Related papers (2022-05-20T03:13:02Z) - Unpaired Referring Expression Grounding via Bidirectional Cross-Modal
Matching [53.27673119360868]
Referring expression grounding is an important and challenging task in computer vision.
We propose a novel bidirectional cross-modal matching (BiCM) framework to address these challenges.
Our framework outperforms previous works by 6.55% and 9.94% on two popular grounding datasets.
arXiv Detail & Related papers (2022-01-18T01:13:19Z) - A Simple Long-Tailed Recognition Baseline via Vision-Language Model [92.2866546058082]
The visual world naturally exhibits a long-tailed distribution of open classes, which poses great challenges to modern visual systems.
Recent advances in contrastive visual-language pretraining shed light on a new pathway for visual recognition.
We propose BALLAD to leverage contrastive vision-language models for long-tailed recognition.
arXiv Detail & Related papers (2021-11-29T17:49:24Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Dynamic Sampling for Deep Metric Learning [7.010669841466896]
Deep metric learning maps visually similar images onto nearby locations and visually dissimilar images apart from each other in an embedding manifold.
A dynamic sampling strategy is proposed to organize the training pairs in an easy-to-hard order to feed into the network.
It allows the network to learn general boundaries between categories from the easy training pairs at its early stages and finalize the details of the model mainly relying on the hard training samples in the later.
arXiv Detail & Related papers (2020-04-24T09:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.