Related papers: Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

URL: http://arxiv.org/abs/2303.14969v1
Date: Mon, 27 Mar 2023 07:58:42 GMT
Title: Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching
Authors: Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong
Abstract summary: We propose Visual Token Matching (VTM) as a universal few-shot learner for arbitrary dense prediction tasks. VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks.
Score: 26.26540176172197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision. Codes are available at https://github.com/GitGyun/visual_token_matching.

Related papers

WeakSupCon: Weakly Supervised Contrastive Learning for Encoder Pre-training [1.3124513975412253]
Weakly supervised multiple instance learning (MIL) is a challenging task given that only bag-level labels are provided. We propose a novel encoder pre-training method for downstream MIL tasks called Weakly Supervised Contrastive Learning (WeakSupCon) In our method, we employ multi-task learning and define distinct contrastive learning losses for samples with different bag labels.
arXiv Detail & Related papers (2025-03-06T07:25:43Z)
Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions [44.78165979575075]
We propose a novel approach to optimize a set of learnable hierarchical task tokens. The global task tokens are designed for effective cross-task feature interactions in a global context. A group of fine-grained task-specific spatial tokens for each task is learned from the corresponding global task tokens. The learned global and local fine-grained task tokens are further used to discover pseudo task-specific dense labels at different levels of granularity.
arXiv Detail & Related papers (2024-11-27T23:53:27Z)
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model [83.85856356798531]
VistaLLM is a visual system that addresses coarse- and fine-grained vision-language tasks. It employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences. We also introduce a novel task, AttCoSeg, which boosts the model's reasoning and grounding capability over multiple input images.
arXiv Detail & Related papers (2023-12-19T18:53:01Z)
Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks. Our method is motivated by the three key factors in detection: localization, scale consistency and recognition. Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z)
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token [30.6086480249568]
We show a single unified model that simultaneously handles two typical visual tasks of instance segmentation and depth estimation. We propose several new techniques that take into account the particularity of visual tasks. We achieve 0.279 RMSE on the specific task of NYUv2 depth estimation, setting a new record on this benchmark.
arXiv Detail & Related papers (2023-01-05T18:55:20Z)
Prompt Tuning with Soft Context Sharing for Vision-Language Models [42.61889428498378]
We propose a novel method to tune pre-trained vision-language models on multiple target few-shot tasks jointly. We show that SoftCPT significantly outperforms single-task prompt tuning methods.
arXiv Detail & Related papers (2022-08-29T10:19:10Z)
On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community. Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored. Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate. In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks. For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z)
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification. Our key idea is to represent each task using gradient information from a base model. Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z)
Automated Self-Supervised Learning for Graphs [37.14382990139527]
This work aims to investigate how to automatically leverage multiple pretext tasks effectively. We make use of a key principle of many real-world graphs, i.e., homophily, as the guidance to effectively search various self-supervised pretext tasks. We propose the AutoSSL framework which can automatically search over combinations of various self-supervised tasks.
arXiv Detail & Related papers (2021-06-10T03:09:20Z)
Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time. We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z)
Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint. We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning [71.90902837008278]
We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL) In order to adapt to different task combinations, we disentangle the GP-MTL networks into single-task backbones. We also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures.
arXiv Detail & Related papers (2020-03-31T09:49:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.