Fugu-MT 論文翻訳(概要): A Dynamic Feature Interaction Framework for Multi-task Visual Perception

論文の概要: A Dynamic Feature Interaction Framework for Multi-task Visual Perception

arxiv url: http://arxiv.org/abs/2306.05061v1
Date: Thu, 8 Jun 2023 09:24:46 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-09 15:16:22.127510
Title: A Dynamic Feature Interaction Framework for Multi-task Visual Perception
Title（参考訳）: マルチタスク視覚知覚のための動的特徴相互作用フレームワーク
Authors: Yuling Xi, Hao Chen, Ning Wang, Peng Wang, Yanning Zhang, Chunhua Shen, Yifan Liu
Abstract要約: 複数の共通認識課題を解決するための効率的な統合フレームワークを考案する。これらのタスクには、インスタンスセグメンテーション、セマンティックセグメンテーション、モノクル3D検出、深さ推定が含まれる。提案するフレームワークはD2BNetと呼ばれ,マルチタスク認識のためのパラメータ効率予測に一意なアプローチを示す。
参考スコア（独自算出の注目度）: 100.98434079696268
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-task visual perception has a wide range of applications in scene understanding such as autonomous driving. In this work, we devise an efficient unified framework to solve multiple common perception tasks, including instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation. Simply sharing the same visual feature representations for these tasks impairs the performance of tasks, while independent task-specific feature extractors lead to parameter redundancy and latency. Thus, we design two feature-merge branches to learn feature basis, which can be useful to, and thus shared by, multiple perception tasks. Then, each task takes the corresponding feature basis as the input of the prediction task head to fulfill a specific task. In particular, one feature merge branch is designed for instance-level recognition the other for dense predictions. To enhance inter-branch communication, the instance branch passes pixel-wise spatial information of each instance to the dense branch using efficient dynamic convolution weighting. Moreover, a simple but effective dynamic routing mechanism is proposed to isolate task-specific features and leverage common properties among tasks. Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception. In addition, as tasks benefit from co-training with each other, our solution achieves on par results on partially labeled settings on nuScenes and outperforms previous works for 3D detection and depth estimation on the Cityscapes dataset with full supervision.
Abstract（参考訳）: マルチタスク視覚知覚は、自動運転のようなシーン理解に幅広い応用がある。本研究では,インスタンスセグメンテーション,セマンティクスセグメンテーション,単眼3次元検出,深さ推定など,複数の共通知覚課題を解決するための効率的な統一フレームワークを考案する。これらのタスクで同じ視覚的特徴表現を共有するだけでタスクのパフォーマンスが損なわれ、独立したタスク固有の特徴抽出器はパラメータの冗長性と遅延につながる。そこで我々は,複数の知覚タスクにおいて有用かつ共有可能な特徴ベースを学ぶために,2つの特徴メルジブランチを設計した。そして、各タスクは、対応する特徴ベースを予測タスクヘッドの入力として、特定のタスクを遂行する。特に、ある特徴マージブランチは、高密度な予測のためのインスタンスレベルの認識のために設計されている。分岐間通信を強化するために、インスタンスブランチは、効率的な動的畳み込み重み付けを用いて、各インスタンスの画素単位の空間情報を高密度ブランチに渡す。さらに,タスク固有の特徴を分離し,タスク間の共通特性を活用するための,単純かつ効果的な動的ルーティング機構を提案する。提案フレームワークであるd2bnetは,マルチタスク知覚のためのパラメータ効率予測にユニークなアプローチを示す。さらに,タスク同士の協調学習のメリットとして,nuScenesの設定を部分的にラベル付けし,Cityscapesデータセット上での3次元検出と深度推定における従来の作業よりも優れた結果が得られる。

論文の概要: A Dynamic Feature Interaction Framework for Multi-task Visual Perception

関連論文リスト