Universal Few-shot Learning of Dense Prediction Tasks with Visual Token
Matching
- URL: http://arxiv.org/abs/2303.14969v1
- Date: Mon, 27 Mar 2023 07:58:42 GMT
- Title: Universal Few-shot Learning of Dense Prediction Tasks with Visual Token
Matching
- Authors: Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong
- Abstract summary: We propose Visual Token Matching (VTM) as a universal few-shot learner for arbitrary dense prediction tasks.
VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm.
We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks.
- Score: 26.26540176172197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dense prediction tasks are a fundamental class of problems in computer
vision. As supervised methods suffer from high pixel-wise labeling cost, a
few-shot learning solution that can learn any dense task from a few labeled
images is desired. Yet, current few-shot learning methods target a restricted
set of tasks such as semantic segmentation, presumably due to challenges in
designing a general and unified model that is able to flexibly and efficiently
adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching
(VTM), a universal few-shot learner for arbitrary dense prediction tasks. It
employs non-parametric matching on patch-level embedded tokens of images and
labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with
a tiny amount of task-specific parameters that modulate the matching algorithm.
We implement VTM as a powerful hierarchical encoder-decoder architecture
involving ViT backbones where token matching is performed at multiple feature
hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset
and observe that it robustly few-shot learns various unseen dense prediction
tasks. Surprisingly, it is competitive with fully supervised baselines using
only 10 labeled examples of novel tasks (0.004% of full supervision) and
sometimes outperforms using 0.1% of full supervision. Codes are available at
https://github.com/GitGyun/visual_token_matching.
Related papers
- Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model [83.85856356798531]
VistaLLM is a visual system that addresses coarse- and fine-grained vision-language tasks.
It employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences.
We also introduce a novel task, AttCoSeg, which boosts the model's reasoning and grounding capability over multiple input images.
arXiv Detail & Related papers (2023-12-19T18:53:01Z) - Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks.
Our method is motivated by the three key factors in detection: localization, scale consistency and recognition.
Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z) - All in Tokens: Unifying Output Space of Visual Tasks via Soft Token [30.6086480249568]
We show a single unified model that simultaneously handles two typical visual tasks of instance segmentation and depth estimation.
We propose several new techniques that take into account the particularity of visual tasks.
We achieve 0.279 RMSE on the specific task of NYUv2 depth estimation, setting a new record on this benchmark.
arXiv Detail & Related papers (2023-01-05T18:55:20Z) - Prompt Tuning with Soft Context Sharing for Vision-Language Models [42.61889428498378]
We propose a novel method to tune pre-trained vision-language models on multiple target few-shot tasks jointly.
We show that SoftCPT significantly outperforms single-task prompt tuning methods.
arXiv Detail & Related papers (2022-08-29T10:19:10Z) - On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community.
Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored.
Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate.
In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks.
For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Automated Self-Supervised Learning for Graphs [37.14382990139527]
This work aims to investigate how to automatically leverage multiple pretext tasks effectively.
We make use of a key principle of many real-world graphs, i.e., homophily, as the guidance to effectively search various self-supervised pretext tasks.
We propose the AutoSSL framework which can automatically search over combinations of various self-supervised tasks.
arXiv Detail & Related papers (2021-06-10T03:09:20Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z) - MTL-NAS: Task-Agnostic Neural Architecture Search towards
General-Purpose Multi-Task Learning [71.90902837008278]
We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL)
In order to adapt to different task combinations, we disentangle the GP-MTL networks into single-task backbones.
We also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures.
arXiv Detail & Related papers (2020-03-31T09:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.