Fugu-MT 論文翻訳(概要): Exploring Effective Factors for Improving Visual In-Context Learning

論文の概要: Exploring Effective Factors for Improving Visual In-Context Learning

arxiv url: http://arxiv.org/abs/2304.04748v1
Date: Mon, 10 Apr 2023 17:59:04 GMT
ステータス: 翻訳完了
システム内更新日: 2023-04-11 14:05:34.486827
Title: Exploring Effective Factors for Improving Visual In-Context Learning
Title（参考訳）: 視覚インコンテキスト学習の改善のための効果的な要因の検討
Authors: Yanpeng Sun, Qiang Chen, Jian Wang, Jingdong Wang, Zechao Li
Abstract要約: In-Context Learning(ICL)は、いくつかのデモ(別名プロンプト)を通じて新しいタスクを理解し、モデルをチューニングせずに新しい入力を予測することである。本稿では,視覚的文脈学習の推論性能に直接的な影響を及ぼす要因として,迅速な選択と迅速な融合があげられる。視覚的インコンテキスト学習のためのシンプルなフレームワークプロンプトSelFを提案する。
参考スコア（独自算出の注目度）: 56.14208975380607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that prompt selection and prompt fusion are two major factors that have a direct impact on the inference performance of visual context learning. Prompt selection is the process of identifying the most appropriate prompt or example to help the model understand new tasks. This is important because providing the model with relevant prompts can help it learn more effectively and efficiently. Prompt fusion involves combining knowledge from different positions within the large-scale visual model. By doing this, the model can leverage the diverse knowledge stored in different parts of the model to improve its performance on new tasks. Based these findings, we propose a simple framework prompt-SelF for visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate all the knowledge stored in the large-scale model, and finally ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. And we conduct extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, the prompt-SelF has outperformed OSLSM based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at \url{https://github.com/syp2ysy/prompt-SelF}.
Abstract（参考訳）: In-Context Learning(ICL)は、いくつかのデモ(別名プロンプト)を通じて新しいタスクを理解し、モデルをチューニングせずに新しい入力を予測する。 NLPで広く研究されているが、コンピュータビジョンにおける比較的新しい研究分野である。本稿では,視覚内コンテキスト学習の性能に影響を与える要因を明らかにするため,プロンプト選択とプロンプト融合が,視覚内コンテキスト学習の推論性能に直接影響を与える2つの主要な要因であることを示す。プロンプト選択は、モデルが新しいタスクを理解するのに役立つ最も適切なプロンプトや例を特定するプロセスである。モデルに関連するプロンプトを提供することで、より効果的かつ効率的に学ぶことができるため、これは重要です。プロンプト融合は、大規模視覚モデル内の異なる位置からの知識を組み合わせることを伴う。これにより、モデルはモデルのさまざまな部分に格納された多様な知識を活用して、新しいタスクのパフォーマンスを向上させることができる。これらの知見に基づき、視覚的インコンテキスト学習のためのシンプルなフレームワークプロンプトSelFを提案する。具体的には,まず画素レベルの検索手法を用いて適切なプロンプトを選択し,次に異なるプロンプト融合法を用いて大規模モデルに格納されたすべての知識を活性化し,最終的に異なるプロンプト融合法から得られた予測結果をアンサンブルして最終的な予測結果を得る。また,単目的セグメンテーションと検出タスクについて広範な実験を行い,プロンプトSelFの有効性を実証した。注目すべきは、プロンプトSelFがOSLSMベースのメタラーニングを初めて1ショットセグメンテーションで上回ったことだ。これは、コンテキスト内学習の大きな可能性を示している。ソースコードとモデルは \url{https://github.com/syp2ysy/prompt-SelF} で入手できる。

論文の概要: Exploring Effective Factors for Improving Visual In-Context Learning

関連論文リスト