Fugu-MT 論文翻訳(概要): Recognition and 3D Localization of Pedestrian Actions from Monocular Video

論文の概要: Recognition and 3D Localization of Pedestrian Actions from Monocular Video

arxiv url: http://arxiv.org/abs/2008.01162v1
Date: Mon, 3 Aug 2020 19:57:03 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-03 06:51:16.634793
Title: Recognition and 3D Localization of Pedestrian Actions from Monocular Video
Title（参考訳）: 単眼映像からの歩行者行動の認識と3次元局在
Authors: Jun Hayakawa, Behzad Dariush
Abstract要約: 本稿では,エゴセントリックな視点から,単眼歩行行動認識と3D位置認識に焦点を当てた。都市交通シーンにおけるこの問題に対処する上での課題は、歩行者の予測不可能な行動に起因する。
参考スコア（独自算出の注目度）: 11.29865843123467
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Understanding and predicting pedestrian behavior is an important and challenging area of research for realizing safe and effective navigation strategies in automated and advanced driver assistance technologies in urban scenes. This paper focuses on monocular pedestrian action recognition and 3D localization from an egocentric view for the purpose of predicting intention and forecasting future trajectory. A challenge in addressing this problem in urban traffic scenes is attributed to the unpredictable behavior of pedestrians, whereby actions and intentions are constantly in flux and depend on the pedestrians pose, their 3D spatial relations, and their interaction with other agents as well as with the environment. To partially address these challenges, we consider the importance of pose toward recognition and 3D localization of pedestrian actions. In particular, we propose an action recognition framework using a two-stream temporal relation network with inputs corresponding to the raw RGB image sequence of the tracked pedestrian as well as the pedestrian pose. The proposed method outperforms methods using a single-stream temporal relation network based on evaluations using the JAAD public dataset. The estimated pose and associated body key-points are also used as input to a network that estimates the 3D location of the pedestrian using a unique loss function. The evaluation of our 3D localization method on the KITTI dataset indicates the improvement of the average localization error as compared to existing state-of-the-art methods. Finally, we conduct qualitative tests of action recognition and 3D localization on HRI's H3D driving dataset.
Abstract（参考訳）: 歩行者行動の理解と予測は、都市部における安全かつ効果的なナビゲーション戦略を実現するための重要かつ困難な研究分野である。本稿では,自発的視点からの歩行者行動認識と3次元位置推定に着目し,意図の予測と今後の軌道予測を目的としている。都市交通現場におけるこの問題に対処する上での課題は、歩行者の予測不能な行動に起因しており、行動や意図が常に変動し、歩行者のポーズや3d空間的関係、他のエージェントや環境との相互作用に依存する。これらの課題を部分的に解決するために,歩行者行動の認識と3次元位置決めにおけるポーズの重要性を考察する。特に,トラックされた歩行者の生RGB画像シーケンスに対応する入力と歩行者のポーズを含む2ストリームの時間関係ネットワークを用いた行動認識フレームワークを提案する。提案手法は,jaad公開データセットを用いた評価に基づいて,単ストリーム時間関係ネットワークを用いた手法よりも優れる。推定ポーズと関連するボディキーポイントは、ユニークな損失関数を用いて歩行者の3次元位置を推定するネットワークへの入力としても使用される。 KITTIデータセットにおける3次元局所化手法の評価は,既存の最先端手法と比較して平均局所化誤差の改善を示す。最後に,HRIのH3D駆動データセット上で,動作認識と3D局所化の質的テストを行う。

関連論文リスト

Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotionは、多種多様な視覚的手がかりを利用して人間の行動を予測する、汎用トランスフォーマーベースのモデルである。提案手法は,JTA,JRDB,歩行者,道路交通のサイクリスト,ETH-UCYなど,複数のデータセットで検証されている。
論文参考訳（メタデータ） (2023-12-26T18:56:49Z)
ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction [62.599588577671796]
本稿では,RGB-Dフレームのストリームから3次元セマンティックマップを段階的に再構成するオンライン3次元セマンティックセマンティックセマンティクス手法を提案する。オフラインの手法とは異なり、ロボット工学や混合現実のようなリアルタイムな制約のあるシナリオに直接適用できます。
論文参考訳（メタデータ） (2023-11-29T20:30:18Z)
Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints [25.550524178542833]
歩行者の横断行動認識と軌道予測のための新しいマルチタスク学習フレームワークを提案する。生のセンサデータから抽出した3D人間のキーポイントを用いて、人間のポーズや活動に関する豊富な情報をキャプチャする。提案手法は,幅広い評価指標を用いて最先端の性能を実現する。
論文参考訳（メタデータ） (2023-06-01T18:27:48Z)
LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATeは、3Dシーケンスでアクションを共同でローカライズし認識するエンドツーエンドのアプローチである。画像やパッチの特徴を入力として考えるトランスフォーマーベースのオブジェクト検出や分類モデルとは異なり、LocATeのトランスフォーマーモデルはシーケンス内のアクション間の長期的な相関をキャプチャすることができる。 BABEL-TAL-20 (BT20) という新しい,挑戦的で,より現実的なベンチマークデータセットを導入する。
論文参考訳（メタデータ） (2022-03-21T03:35:32Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
3D MOT問題に対する統一型学習型アプローチを提案します。我々は、完全にトレーニング可能なデータアソシエーションにNeural Message Passing Networkを使用します。 AMOTAの65.6%の最先端性能と58%のIDスウィッチを達成して、公開可能なnuScenesデータセットに対する提案手法のメリットを示す。
論文参考訳（メタデータ） (2021-04-23T17:59:28Z)
Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [91.69900691029908]
我々は、個々の動きとシーン占有マップの両方を予測することを提唱する。歩行者の相対的な空間情報を保存するScene-Actor Graph Neural Network (SA-GNN)を提案する。 2つの大規模な実世界のデータセットで、我々のシーン占有率予測が最先端のモーション予測手法よりも正確でより校正されていることを示した。
論文参考訳（メタデータ） (2021-01-07T06:08:21Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
実画像中の車両に動的部品を付加した3次元自動車モデルによる効果的なトレーニングデータ生成プロセスを提案する。私達のアプローチは人間の相互作用なしで完全に自動です。 VUS解析用マルチタスクネットワークとVHI解析用マルチストリームネットワークを提案する。
論文参考訳（メタデータ） (2020-12-15T03:03:38Z)
PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction in 3D [10.580548257913843]
nuScenesにフレーム毎の2D/3Dバウンディングボックスと動作アノテーションを追加して作成された新しい歩行者行動予測データセットを提案する。また,歩行者横断行動予測のための様々なデータモダリティを組み込んだハイブリッドニューラルネットワークアーキテクチャを提案する。
論文参考訳（メタデータ） (2020-12-14T18:13:44Z)
Graph-SIM: A Graph-based Spatiotemporal Interaction Modelling for Pedestrian Action Prediction [10.580548257913843]
本稿では,歩行者の横断行動を予測するための新しいグラフベースモデルを提案する。既存のnuScenesデータセットに対して、3Dバウンディングボックスと歩行者行動アノテーションを提供する新しいデータセットを紹介します。提案手法は,既存の手法と比較して,様々な指標を15%以上改善し,最先端の性能を実現する。
論文参考訳（メタデータ） (2020-12-03T18:28:27Z)
A Real-Time Predictive Pedestrian Collision Warning Service for Cooperative Intelligent Transportation Systems Using 3D Pose Estimation [10.652350454373531]
歩行者方向認識(100.53 FPS)と意図予測(35.76 FPS)の2つのタスクに対して,リアルタイムな歩行者衝突警報サービス(P2CWS)を提案する。提案手法は,提案したサイトに依存しない特徴により,複数のサイトに対する一般化を満足する。提案したビジョンフレームワークは、トレーニングプロセスなしでTUDデータセットの行動認識タスクの89.3%の精度を実現する。
論文参考訳（メタデータ） (2020-09-23T00:55:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。