Fugu-MT 論文翻訳(概要): ViDaS Video Depth-aware Saliency Network

論文の概要: ViDaS Video Depth-aware Saliency Network

arxiv url: http://arxiv.org/abs/2305.11729v1
Date: Fri, 19 May 2023 15:04:49 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-22 14:00:34.719851
Title: ViDaS Video Depth-aware Saliency Network
Title（参考訳）: ViDaSビデオ深度対応サリエンシネットワーク
Authors: Ioanna Diamanti, Antigoni Tsiami, Petros Koutras and Petros Maragos
Abstract要約: 両ストリームの完全な畳み込みビデオ,Depth-Aware Saliency ネットワークである ViDaS を紹介する。ビデオのサリエンシ予測を通じて、注目度モデリングの問題に対処する。ネットワークは2つのビジュアルストリームで構成され、1つはRGBフレーム用、もう1つは奥行きフレーム用である。エンドツーエンドでトレーニングされ、アイトラッキングデータを備えたさまざまなデータベースで評価される。
参考スコア（独自算出の注目度）: 40.08270905030302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce ViDaS, a two-stream, fully convolutional Video, Depth-Aware Saliency network to address the problem of attention modeling ``in-the-wild", via saliency prediction in videos. Contrary to existing visual saliency approaches using only RGB frames as input, our network employs also depth as an additional modality. The network consists of two visual streams, one for the RGB frames, and one for the depth frames. Both streams follow an encoder-decoder approach and are fused to obtain a final saliency map. The network is trained end-to-end and is evaluated in a variety of different databases with eye-tracking data, containing a wide range of video content. Although the publicly available datasets do not contain depth, we estimate it using three different state-of-the-art methods, to enable comparisons and a deeper insight. Our method outperforms in most cases state-of-the-art models and our RGB-only variant, which indicates that depth can be beneficial to accurately estimating saliency in videos displayed on a 2D screen. Depth has been widely used to assist salient object detection problems, where it has been proven to be very beneficial. Our problem though differs significantly from salient object detection, since it is not restricted to specific salient objects, but predicts human attention in a more general aspect. These two problems not only have different objectives, but also different ground truth data and evaluation metrics. To our best knowledge, this is the first competitive deep learning video saliency estimation approach that combines both RGB and Depth features to address the general problem of saliency estimation ``in-the-wild". The code will be publicly released.
Abstract（参考訳）: We introduce ViDaS, a two-stream, fully convolutional Video, Depth-Aware Saliency network to address the problem of attention modeling ``in-the-wild", via saliency prediction in videos. Contrary to existing visual saliency approaches using only RGB frames as input, our network employs also depth as an additional modality. The network consists of two visual streams, one for the RGB frames, and one for the depth frames. Both streams follow an encoder-decoder approach and are fused to obtain a final saliency map. The network is trained end-to-end and is evaluated in a variety of different databases with eye-tracking data, containing a wide range of video content. Although the publicly available datasets do not contain depth, we estimate it using three different state-of-the-art methods, to enable comparisons and a deeper insight. Our method outperforms in most cases state-of-the-art models and our RGB-only variant, which indicates that depth can be beneficial to accurately estimating saliency in videos displayed on a 2D screen. Depth has been widely used to assist salient object detection problems, where it has been proven to be very beneficial. Our problem though differs significantly from salient object detection, since it is not restricted to specific salient objects, but predicts human attention in a more general aspect. These two problems not only have different objectives, but also different ground truth data and evaluation metrics. To our best knowledge, this is the first competitive deep learning video saliency estimation approach that combines both RGB and Depth features to address the general problem of saliency estimation ``in-the-wild". コードは公開される予定だ。

関連論文リスト

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation [66.7989548848166]
既存のアプローチでは、深度マップとRGBイメージをエンコードし、それらの間に特徴融合を行い、より堅牢な予測を可能にする。我々はDFormerv2という強力なRGBDエンコーダを提案し、ニューラルネットワークで深度情報をエンコードするのではなく、深度マップを幾何先行として明示的に利用する。我々のゴールは、すべての画像パッチトークンの深さと空間距離から幾何学的手がかりを抽出し、それを用いて、自己注意における注意重みを割り当てることである。
論文参考訳（メタデータ） (2025-04-07T03:06:07Z)
DepthLab: From Partial to Complete [80.58276388743306]
不足する値は、幅広いアプリケーションにわたる深度データにとって共通の課題である。この作業は、イメージ拡散プリエントを利用した基礎深度塗装モデルであるDepthLabと、このギャップを埋めるものだ。提案手法は,3Dシーンのインペイント,テキストから3Dシーン生成,DUST3Rによるスパースビュー再構成,LiDAR深度補完など,様々なダウンストリームタスクにおいて有用であることを示す。
論文参考訳（メタデータ） (2024-12-24T04:16:38Z)
DEAR: Depth-Enhanced Action Recognition [9.933324297265495]
本研究では,3次元特徴と深度マップをRGB特徴と組み合わせ,行動認識の精度を高める新しいアプローチを提案する。提案手法では,RGB機能エンコーダとは別個のブランチを通じて推定深度マップを処理し,そのシーンや動作を包括的に理解するために特徴を融合させる。
論文参考訳（メタデータ） (2024-08-28T10:08:38Z)
Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
本研究は,映像深度推定の課題に対処する。我々は予測タスクを条件付き生成問題に再構成する。これにより、既存のビデオ生成モデルに埋め込まれた事前の知識を活用することができる。
論文参考訳（メタデータ） (2024-06-03T16:20:24Z)
NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation [58.21817572577012]
ビデオ深度推定は時間的に一貫した深度を推定することを目的としている。プラグ・アンド・プレイ方式で様々な単一画像モデルから推定される不整合深さを安定化するNVDS+を導入する。このデータセットには、200万フレーム以上の14,203本のビデオが含まれている。
論文参考訳（メタデータ） (2023-07-17T17:57:01Z)
RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
我々は、新しい深度認識フレームワークを用いて、RGB-D有向物体検出問題を解くための最初の試みを行う。本稿では,RGB-D SODタスクにおける3つの課題を解決するために,ユビキタスターゲット意識(UTA)ネットワークを提案する。提案するUTAネットワークは深度フリーで,43FPSでリアルタイムに動作可能である。
論文参考訳（メタデータ） (2021-09-08T04:27:29Z)
DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues [37.837552043766166]
In-the-wild シーンからなる第1の深度データセット DynOcc を導入する。提案手法は,これらのダイナミックシーンの手がかりを利用して,選択したビデオフレームのポイント間の深さ関係を推定する。 DynOccデータセットには、91Kフレームのさまざまなビデオセットから22Mの深さペアが含まれています。
論文参考訳（メタデータ） (2021-03-30T22:17:36Z)
Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion [56.85837052421469]
コスト効率のよいセンサで得られたデータからシーン形状を推定することは、ロボットや自動運転車にとって鍵となる。本稿では,1枚のRGB画像から,低コストな能動深度センサによるスパース計測により,深度を推定する問題について検討する。 sparse networks (sans) は,深さ予測と完了という2つのタスクをmonodepthネットワークで実行可能にする,新しいモジュールである。
論文参考訳（メタデータ） (2021-03-30T21:22:26Z)
Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-Dサリエンシ検出は、いくつかの課題シナリオにおいて素晴らしい能力を示している。本稿では,エッジ,深度,塩分濃度をより効率的に活用する新しい協調学習フレームワークを提案する。
論文参考訳（メタデータ） (2020-07-23T04:33:36Z)
Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
本稿では,RGB情報のみを推論の入力とする統合深度認識フレームワークの実現に向けた最初の試みを行う。 5つの公開RGB SODベンチマークの最先端のパフォーマンスを上回るだけでなく、5つのベンチマークのRGBDベースのメソッドを大きく上回っている。
論文参考訳（メタデータ） (2020-05-30T13:40:03Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。