Fugu-MT 論文翻訳(概要): X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

論文の概要: X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

arxiv url: http://arxiv.org/abs/2203.08764v1
Date: Wed, 16 Mar 2022 17:23:26 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-17 13:46:26.897405
Title: X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
Title（参考訳）: X-Learner: ユニバーサルビジュアル表現のためのクロスソースとタスクを学ぶ
Authors: Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao
Abstract要約: 本稿では,X-Learnerという表現学習フレームワークを提案する。 X-Learnerは、様々なソースによって管理される複数の視覚タスクの普遍的な特徴を学習する。 X-Learnerは、追加のアノテーションやモダリティ、計算コストを使わずに、様々なタスクで強力なパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 71.51719469058666
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In computer vision, pre-training models based on largescale supervised learning have been proven effective over the past few years. However, existing works mostly focus on learning from individual task with single data source (e.g., ImageNet for classification or COCO for detection). This restricted form limits their generalizability and usability due to the lack of vast semantic information from various tasks and data sources. Here, we demonstrate that jointly learning from heterogeneous tasks and multiple data sources contributes to universal visual representation, leading to better transferring results of various downstream tasks. Thus, learning how to bridge the gaps among different tasks and data sources is the key, but it still remains an open question. In this work, we propose a representation learning framework called X-Learner, which learns the universal feature of multiple vision tasks supervised by various sources, with expansion and squeeze stage: 1) Expansion Stage: X-Learner learns the task-specific feature to alleviate task interference and enrich the representation by reconciliation layer. 2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring. Extensive experiments demonstrate that X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs compared to existing representation learning methods. Notably, a single X-Learner model shows remarkable gains of 3.0%, 3.3% and 1.8% over current pretrained models on 12 downstream datasets for classification, object detection and semantic segmentation.
Abstract（参考訳）: コンピュータビジョンでは、大規模な教師付き学習に基づく事前学習モデルがここ数年で有効であることが証明されている。しかし、既存の研究は主に個々のタスクから単一のデータソース(分類のためのImageNetや検出のためのCOCOなど)で学習することに焦点を当てている。この制限された形式は、様々なタスクやデータソースからの膨大な意味情報の欠如により、その汎用性とユーザビリティを制限する。ここでは、異種タスクと複数のデータソースからの共同学習が普遍的な視覚表現に寄与し、様々な下流タスクの転送結果がより良くなることを示す。このようにして、さまざまなタスクとデータソース間のギャップを橋渡しする方法を学ぶことが鍵となるが、それでもまだ疑問は残されている。本稿では,様々な情報源が監督する複数の視覚課題の普遍的特徴を学習し,拡張と絞り込みの段階を学習する表現学習フレームワークであるx-learnerを提案する。 1)拡張段階:X-Learnerはタスク固有の特徴を学習し,タスク干渉を緩和し,和解層による表現を充実させる。 2) スクイーズステージ: x-learnerはモデルを合理的なサイズに凝縮し、様々なタスク転送の普遍的で一般化可能な表現を学ぶ。 X-Learnerは、既存の表現学習法と比較して、余分なアノテーションやモダリティ、計算コストを伴わずに、様々なタスクにおいて高いパフォーマンスを達成することを示した。特に、単一のX-Learnerモデルは、分類、オブジェクト検出、セマンティックセグメンテーションのための12の下流データセット上の現在の事前訓練モデルよりも3.0%、3.3%、および1.8%の顕著な増加を示している。

論文の概要: X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

関連論文リスト