Fugu-MT 論文翻訳(概要): Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning

論文の概要: Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning

arxiv url: http://arxiv.org/abs/2012.11552v1
Date: Mon, 21 Dec 2020 18:31:21 GMT
ステータス: 翻訳完了
システム内更新日: 2021-04-27 06:43:22.552269
Title: Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning
Title（参考訳）: 教師なし表現学習のためのオンラインビジュアルワード生成
Authors: Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, Patrick P\'erez
Abstract要約: 本研究では,コンベネットを訓練して画像のバッフル・オブ・ビジュアルワード(bow)表現を再構築し,表現を学習する教師・学生計画を提案する。私たちの戦略は、教師ネットワーク(BoWターゲットを生成する役割)と学生ネットワーク(表現を学ぶ役割)の両方のオンライントレーニングと、ビジュアルワード語彙のオンライン更新を実行します。
参考スコア（独自算出の注目度）: 59.29452780994169
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types of perturbations, especially via contrastive-based instance discrimination training. Although effective visual representations should indeed exhibit such invariances, there are other important characteristics, such as encoding contextual reasoning skills, for which alternative reconstruction-based approaches might be better suited. With this in mind, we propose a teacher-student scheme to learn representations by training a convnet to reconstruct a bag-of-visual-words (BoW) representation of an image, given as input a perturbed version of that same image. Our strategy performs an online training of both the teacher network (whose role is to generate the BoW targets) and the student network (whose role is to learn representations), along with an online update of the visual-words vocabulary (used for the BoW targets). This idea effectively enables fully online BoW-guided unsupervised learning. Extensive experiments demonstrate the interest of our BoW-based strategy which surpasses previous state-of-the-art methods (including contrastive-based ones) in several applications. For instance, in downstream tasks such Pascal object detection, Pascal classification and Places205 classification, our method improves over all prior unsupervised approaches, thus establishing new state-of-the-art results that are also significantly better even than those of supervised pre-training. We provide the implementation code at https://github.com/valeoai/obow.
Abstract（参考訳）: 人間の監督なしに画像表現を学ぶことは重要かつ活発な研究分野である。最近のいくつかのアプローチは、このような表現を異なるタイプの摂動の下で不変にするというアイデアをうまく活用している。効果的な視覚表現は、実際にそのような不変性を示すべきであるが、文脈推論スキルを符号化するなど、代替的な再構成に基づくアプローチがより適している、その他の重要な特徴がある。このことを念頭において,同画像の摂動バージョンを入力として与えられた画像のbag-of-visual-words(BoW)表現を再構成するために,コンネットを訓練して表現を学習する教師学習方式を提案する。私たちの戦略は、教師ネットワーク(弓のターゲットを生成する役割)と学生ネットワーク(表現を学ぶ役割)の両方のオンライントレーニングと、視覚単語語彙のオンライン更新(弓のターゲットに使用される)を行います。このアイデアは、完全にオンラインのBoW誘導型教師なし学習を可能にする。複数のアプリケーションにおいて、従来の最先端の手法(対照的な手法を含む)を超越したBoWベースの戦略の関心を示している。例えば、Pascalオブジェクト検出、Pascal分類、Places205分類などの下流タスクでは、従来の教師なしアプローチよりも改善され、教師付き事前学習よりもはるかに優れた新しい最先端の結果が確立される。実装コードはhttps://github.com/valeoai/obowで提供します。

論文の概要: Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning

関連論文リスト