Enhancing Zero-shot Counting via Language-guided Exemplar Learning
- URL: http://arxiv.org/abs/2402.05394v1
- Date: Thu, 8 Feb 2024 04:07:38 GMT
- Title: Enhancing Zero-shot Counting via Language-guided Exemplar Learning
- Authors: Mingjie Wang and Jun Zhou and Yong Dai and Eric Buys and Minglun Gong
- Abstract summary: Class-Agnostic Counting (CAC) problem has garnered increasing attention owing to its intriguing generality and superior efficiency.
This paper proposes a novel ExpressCount to enhance zero-shot object counting by delving deeply into language-guided exemplar learning.
The ExpressCount is comprised of an innovative Language-oriented Exemplar Perceptron and a downstream visual Zero-shot Counting pipeline.
- Score: 17.479926342093677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Class-Agnostic Counting (CAC) problem has garnered increasing
attention owing to its intriguing generality and superior efficiency compared
to Category-Specific Counting (CSC). This paper proposes a novel ExpressCount
to enhance zero-shot object counting by delving deeply into language-guided
exemplar learning. Specifically, the ExpressCount is comprised of an innovative
Language-oriented Exemplar Perceptron and a downstream visual Zero-shot
Counting pipeline. Thereinto, the perceptron hammers at exploiting accurate
exemplar cues from collaborative language-vision signals by inheriting rich
semantic priors from the prevailing pre-trained Large Language Models (LLMs),
whereas the counting pipeline excels in mining fine-grained features through
dual-branch and cross-attention schemes, contributing to the high-quality
similarity learning. Apart from building a bridge between the LLM in vogue and
the visual counting tasks, expression-guided exemplar estimation significantly
advances zero-shot learning capabilities for counting instances with arbitrary
classes. Moreover, devising a FSC-147-Express with annotations of meticulous
linguistic expressions pioneers a new venue for developing and validating
language-based counting models. Extensive experiments demonstrate the
state-of-the-art performance of our ExpressCount, even showcasing the accuracy
on par with partial CSC models.
Related papers
- Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting [8.000723123087473]
Class-agnostic counting (CAC) is a recent task in computer vision that aims to estimate the number of instances of arbitrary object classes never seen during model training.
We introduce the Prompt-Aware Counting benchmark, which comprises two targeted tests, each accompanied by appropriate evaluation metrics.
arXiv Detail & Related papers (2024-09-24T10:35:42Z) - Chain of Stance: Stance Detection with Large Language Models [3.528201746844624]
Stance detection is an active task in natural language processing (NLP)
We propose a new prompting method, called textitChain of Stance (CoS)
arXiv Detail & Related papers (2024-08-03T16:30:51Z) - From Classification to Generation: Insights into Crosslingual Retrieval
Augmented ICL [8.065775937617417]
We introduce a novel approach that leverages cross-lingual retrieval-augmented in-context learning (CREA-ICL)
By extracting semantically similar prompts from high-resource languages, we aim to improve the zero-shot performance of multilingual pre-trained language models (MPLMs)
Though our approach yields steady improvements in classification tasks, it faces challenges in generation tasks.
arXiv Detail & Related papers (2023-11-11T15:40:21Z) - Prompting Language-Informed Distribution for Compositional Zero-Shot Learning [73.49852821602057]
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional visual concepts.
We propose a model by prompting the language-informed distribution, aka., PLID, for the task.
Experimental results on MIT-States, UT-Zappos, and C-GQA datasets show the superior performance of the PLID to the prior arts.
arXiv Detail & Related papers (2023-05-23T18:00:22Z) - Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations [97.41375480696972]
We introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo-demonstrations for a given test input.
evaluation on nine classification datasets shows that Z-ICL outperforms previous zero-shot methods by a significant margin.
arXiv Detail & Related papers (2022-12-19T21:34:26Z) - Nearest Neighbor Zero-Shot Inference [68.56747574377215]
kNN-Prompt is a technique to use k-nearest neighbor (kNN) retrieval augmentation for zero-shot inference with language models (LMs)
fuzzy verbalizers leverage the sparse kNN distribution for downstream tasks by automatically associating each classification label with a set of natural language tokens.
Experiments show that kNN-Prompt is effective for domain adaptation with no further training, and that the benefits of retrieval increase with the size of the model used for kNN retrieval.
arXiv Detail & Related papers (2022-05-27T07:00:59Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Iterative Correlation-based Feature Refinement for Few-shot Counting [35.27237393354539]
Few-shot counting aims to count objects of any class in an image given only a few exemplars of the same class.
Existing correlation-based few-shot counting approaches suffer from the coarseness and low semantic level of the correlation.
We propose an iterative framework to progressively refine the exemplar-related features based on the correlation between the image and exemplars.
arXiv Detail & Related papers (2022-01-22T03:27:11Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - A Systematic Investigation of Commonsense Understanding in Large
Language Models [23.430757316504316]
Large language models have shown impressive performance on many natural language processing (NLP) tasks in a zero-shot setting.
We ask whether these models exhibit commonsense understanding by evaluating models against four commonsense benchmarks.
arXiv Detail & Related papers (2021-10-31T22:20:36Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.