Fugu-MT 論文翻訳(概要): On the Viability of Monocular Depth Pre-training for Semantic Segmentation

論文の概要: On the Viability of Monocular Depth Pre-training for Semantic Segmentation

arxiv url: http://arxiv.org/abs/2203.13987v5
Date: Thu, 18 Jul 2024 05:36:22 GMT
ステータス: 翻訳完了
システム内更新日: 2024-07-23 02:31:08.849491
Title: On the Viability of Monocular Depth Pre-training for Semantic Segmentation
Title（参考訳）: セマンティックセグメンテーションのための単眼深度事前学習の可能性について
Authors: Dong Lao, Fengyu Yang, Daniel Wang, Hyoungseob Park, Samuel Lu, Alex Wong, Stefano Soatto,
Abstract要約: 本研究は,意味的タスクへの下流移動において,幾何学的タスクの事前学習が有効かどうかを考察する。単分子深度は意味的セグメンテーションのための事前学習の実行可能な形式であり、共通ベースラインの改善によって検証される。
参考スコア（独自算出の注目度）: 48.29060171161375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The question of whether pre-training on geometric tasks is viable for downstream transfer to semantic tasks is important for two reasons, one practical and the other scientific. If the answer is positive, we may be able to reduce pre-training cost and bias from human annotators significantly. If the answer is negative, it may shed light on the role of embodiment in the emergence of language and other cognitive functions in evolutionary history. To frame the question in a way that is testable with current means, we pre-train a model on a geometric task, and test whether that can be used to prime a notion of 'object' that enables inference of semantics as soon as symbols (labels) are assigned. We choose monocular depth prediction as the geometric task, and semantic segmentation as the downstream semantic task, and design a collection of empirical tests by exploring different forms of supervision, training pipelines, and data sources for both depth pre-training and semantic fine-tuning. We find that monocular depth is a viable form of pre-training for semantic segmentation, validated by improvements over common baselines. Based on the findings, we propose several possible mechanisms behind the improvements, including their relation to dataset size, resolution, architecture, in/out-of-domain source data, and validate them through a wide range of ablation studies. We also find that optical flow, which at first glance may seem as good as depth prediction since it optimizes the same photometric reprojection error, is considerably less effective, as it does not explicitly aim to infer the latent structure of the scene, but rather the raw phenomenology of temporally adjacent images.
Abstract（参考訳）: 幾何学的タスクの事前学習が意味的タスクへの下流移動に有効かどうかという問題は2つの理由において重要である。回答が正なら、事前学習のコストと人間のアノテータからのバイアスを大幅に削減できるかもしれません。もしその答えが否定的であれば、進化史における言語や他の認知機能の出現における実施の役割に光を当てるかもしれない。現在の手段で検証可能な方法で質問をフレーム化するために、幾何学的タスクでモデルを事前訓練し、シンボル(ラベル)が割り当てられるとすぐに意味論の推論を可能にする「対象」の概念を素付けるのに使えるかどうかをテストする。本研究では, 単眼深度予測を幾何学的タスクとし, セマンティックセマンティックセマンティックセマンティックセマンティクスを下流セマンティクスタスクとして選択し, 深度事前学習とセマンティクス微調整の両方のための様々な形態の監督, 訓練パイプライン, データソースを探索し, 経験的テストのコレクションを設計する。単分子深度は意味的セグメンテーションのための事前学習の実行可能な形式であり、共通ベースラインの改善によって検証される。本研究は,データセットのサイズ,解像度,アーキテクチャ,ドメイン内ソースデータとの関係など,改善の背景にあるいくつかのメカニズムを提案し,幅広いアブレーション研究を通じて検証する。また,同じ光度再投影誤差を最適化するので,一見すると奥行き予測に相応しいように見える光流も,シーンの潜伏構造を明示的に推測することではなく,時間的に隣接した画像の生の現象を推測することを目的としているため,かなり効果が低いことがわかった。

関連論文リスト

Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations [44.51229445138653]
霊長類の腹側視覚ストリームの機能的役割の研究は、伝統的に対象の分類に焦点を当ててきた。ここでは、別の仮説を探求する: 腹側流は空間的潜伏量の推定に最適化されるか? 数個の空間的遅延を推定するためにトレーニングされたモデルは、数百のカテゴリでトレーニングされたモデルに匹敵するニューラルアライメントスコアを達成できることがわかった。
論文参考訳（メタデータ） (2024-12-12T09:49:16Z)
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps [39.00415825387414]
そこで本研究では, 識別的特徴を3次元的理解で補う意味対応推定手法を提案する。より複雑な3Dパイプラインと比較して、我々のモデルは弱い視点情報しか必要とせず、球面表現の単純さにより、トレーニング中に情報的幾何学的先行情報をモデルに注入することができる。本研究では,SPair-71kデータセットを用いて,複数のオブジェクトカテゴリにまたがる対称なビューと繰り返し部分の区別が可能であることを実証した。
論文参考訳（メタデータ） (2023-12-20T17:35:24Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
このようなパラダイムを模倣学習でどのように行うべきかを評価する。本稿では,事前学習コーパスがマルチタスクのデモンストレーションから成り立つ環境について考察する。逆動力学モデリングはこの設定に適していると主張する。
論文参考訳（メタデータ） (2023-05-26T14:40:46Z)
Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
本稿では,表現の構造を明示的に最適化する原理的目的として,最大エントロピー符号化(MEC)を提案する。 MECは、特定のプリテキストタスクに基づいて、以前のメソッドよりもより一般化可能な表現を学ぶ。 ImageNetリニアプローブだけでなく、半教師付き分類、オブジェクト検出、インスタンスセグメンテーション、オブジェクトトラッキングなど、さまざまなダウンストリームタスクに対して一貫して最先端のパフォーマンスを実現する。
論文参考訳（メタデータ） (2022-10-20T17:58:30Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
我々は、さらなる事前学習段階の正則化として自己蒸留を提案する。画像およびテキスト分類タスクのための様々なベンチマークデータセットにおける自己蒸留の有効性を実証的に検証する。
論文参考訳（メタデータ） (2022-09-30T02:25:12Z)
Masked prediction tasks: a parameter identifiability view [49.533046139235466]
マスク付きトークンの予測に広く用いられている自己教師型学習手法に着目する。いくつかの予測タスクは識別可能性をもたらすが、他のタスクはそうではない。
論文参考訳（メタデータ） (2022-02-18T17:09:32Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
ニューラルネットワークの一般化能力を改善するための補助的学習目標を提案する。我々は、異なるラベルを持つ最小差の例のペア、すなわち反ファクトまたはコントラストの例を使用し、タスクの根底にある因果構造を示す信号を与える。このテクニックで訓練されたモデルは、配布外テストセットのパフォーマンスを向上させる。
論文参考訳（メタデータ） (2020-04-20T02:47:49Z)
Semantically-Guided Representation Learning for Self-Supervised Monocular Depth [40.49380547487908]
本稿では,自己教師付き表現学習を指導するために,事前訓練型セマンティックセマンティック・セマンティクス・ネットワークを利用した新しいアーキテクチャを提案する。本手法は,全画素,細粒度細部,意味カテゴリーごとの自己教師型単眼深度予測のための技術の現状を改善した。
論文参考訳（メタデータ） (2020-02-27T18:40:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。