Fugu-MT 論文翻訳(概要): Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention

論文の概要: Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention

arxiv url: http://arxiv.org/abs/2407.09530v1
Date: Tue, 25 Jun 2024 08:59:33 GMT
ステータス: 翻訳完了
システム内更新日: 2024-07-22 13:18:53.352469
Title: Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention
Title（参考訳）: RFAConvとトリプルト注意に基づく自律走行画像検出の最適化
Authors: Zhipeng Ling, Qi Xin, Yiyu Lin, Guangze Su, Zuwei Shui,
Abstract要約: 本稿では, YOLOv8モデルの拡張のための総合的アプローチを提案する。 C2f_RFAConvモジュールは、機能の抽出効率を高めるために元のモジュールを置き換える。 Triplet Attentionメカニズムは、ターゲット検出の強化のための特徴焦点を強化する。
参考スコア（独自算出の注目度）: 1.345669927504424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: YOLOv8 plays a crucial role in the realm of autonomous driving, owing to its high-speed target detection, precise identification and positioning, and versatile compatibility across multiple platforms. By processing video streams or images in real-time, YOLOv8 rapidly and accurately identifies obstacles such as vehicles and pedestrians on roadways, offering essential visual data for autonomous driving systems. Moreover, YOLOv8 supports various tasks including instance segmentation, image classification, and attitude estimation, thereby providing comprehensive visual perception for autonomous driving, ultimately enhancing driving safety and efficiency. Recognizing the significance of object detection in autonomous driving scenarios and the challenges faced by existing methods, this paper proposes a holistic approach to enhance the YOLOv8 model. The study introduces two pivotal modifications: the C2f_RFAConv module and the Triplet Attention mechanism. Firstly, the proposed modifications are elaborated upon in the methodological section. The C2f_RFAConv module replaces the original module to enhance feature extraction efficiency, while the Triplet Attention mechanism enhances feature focus. Subsequently, the experimental procedure delineates the training and evaluation process, encompassing training the original YOLOv8, integrating modified modules, and assessing performance improvements using metrics and PR curves. The results demonstrate the efficacy of the modifications, with the improved YOLOv8 model exhibiting significant performance enhancements, including increased MAP values and improvements in PR curves. Lastly, the analysis section elucidates the results and attributes the performance improvements to the introduced modules. C2f_RFAConv enhances feature extraction efficiency, while Triplet Attention improves feature focus for enhanced target detection.
Abstract（参考訳）: YOLOv8は、高速な目標検出、正確な識別と位置決め、複数のプラットフォーム間の多目的互換性など、自動運転の領域において重要な役割を担っている。 YOLOv8は、ビデオストリームや画像をリアルタイムで処理することによって、道路上の車両や歩行者などの障害物を迅速かつ正確に識別し、自動運転システムに不可欠な視覚データを提供する。さらに、YOLOv8は、インスタンスセグメンテーション、画像分類、姿勢推定などの様々なタスクをサポートし、これにより、自律運転のための包括的な視覚的認識を提供し、究極的には運転安全性と効率を向上させる。本稿では, 自律走行シナリオにおける物体検出の重要性と既存手法が直面する課題を認識し, YOLOv8モデルを強化するための総合的アプローチを提案する。この研究はC2f_RFAConvモジュールとTriplet Attentionメカニズムという2つの重要な修正点を紹介した。まず,提案手法を方法論的セクションで詳述する。 C2f_RFAConvモジュールは機能の抽出効率を高めるために元のモジュールを置き換える。その後、実験的な手順はトレーニングと評価のプロセスを明確にし、元のYOLOv8のトレーニング、修正モジュールの統合、メトリクスとPR曲線によるパフォーマンス改善の評価を含む。その結果,改良型YOLOv8モデルではMAP値の増大やPR曲線の改善など,大幅な性能向上が見られた。最後に、分析部は結果を解明し、導入したモジュールのパフォーマンス改善を属性とする。 C2f_RFAConvは特徴抽出効率を向上し、Triplet Attentionは目標検出の強化のために特徴焦点を改善した。

関連論文リスト

RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving [10.984203470464687]
視覚言語モデル(VLM)は、空間認識の不十分さや幻覚といった限界に悩まされることが多い。本稿では,自律走行シーンにおけるメタアクションを確実に生成するVLMの能力を高めるための,検索強化意思決定(RAD)フレームワークを提案する。我々は,NuScenesデータセットから得られたデータセットに基づいてVLMを微調整し,その空間的知覚と鳥眼視画像理解能力を高める。
論文参考訳（メタデータ） (2025-03-18T03:25:57Z)
YOLOv12: A Breakdown of the Key Architectural Features [0.5639904484784127]
YOLOv12は、単一ステージのリアルタイム物体検出において重要な進歩である。最適化されたバックボーン(R-ELAN)、分離可能な7x7の畳み込み、およびFlashAttention駆動のエリアベースアテンションが組み込まれている。レイテンシに敏感なアプリケーションと高精度なアプリケーションの両方にスケーラブルなソリューションを提供する。
論文参考訳（メタデータ） (2025-02-20T17:08:43Z)
Research on vehicle detection based on improved YOLOv8 network [0.0]
本稿では,改良型YOLOv8車両検出手法を提案する。改良されたモデルは98.3%、89.1%、そして88.4%の精度で車、人、オートバイを検出できる。
論文参考訳（メタデータ） (2024-12-31T06:19:26Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
我々は、エンドツーエンドの自動運転のためのエゴ中心の完全スパースパラダイムであるDiFSDを提案する。特に、DiFSDは主にスパース知覚、階層的相互作用、反復的な運動プランナーから構成される。 nuScenesとBench2Driveデータセットで実施された実験は、DiFSDの優れた計画性能と優れた効率を実証している。
論文参考訳（メタデータ） (2024-09-15T15:55:24Z)
Research on target detection method of distracted driving behavior based on improved YOLOv8 [6.405098280736171]
本研究では,BOTNetモジュール,GAMアテンション機構,EIoU損失関数を統合することで,従来のYOLOv8モデルに基づく改良されたYOLOv8検出手法を提案する。実験の結果, 精度は99.4%であり, 検出速度, 精度ともに良好であった。
論文参考訳（メタデータ） (2024-07-02T00:43:41Z)
MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
適応型パーソナライズされた自動車追従フレームワークであるMetaFollowerを提案する。まず,モデルに依存しないメタラーニング(MAML)を用いて,様々なCFイベントから共通運転知識を抽出する。さらに、Long Short-Term Memory (LSTM) と Intelligent Driver Model (IDM) を組み合わせて、時間的不均一性を高い解釈性で反映する。
論文参考訳（メタデータ） (2024-06-23T15:30:40Z)
CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model [32.45713037210818]
我々はCriss-Crossed Dual-Stream Enhanced Rectified Transformer Model (CCDSReFormer)を紹介する。 ReSSA(Enhanced Rectified Spatial Self-attention)、ReDASA(Enhanced Rectified Delay Aware Self-attention)、ReTSA(Enhanced Rectified Temporal Self-attention)の3つの革新的なモジュールが含まれている。これらのモジュールは、疎注意による計算ニーズの低減、トラフィックダイナミクスの理解向上のためのローカル情報への注力、ユニークな学習手法による空間的および時間的洞察の融合を目的としている。
論文参考訳（メタデータ） (2024-03-26T14:43:57Z)
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAFはラベルと自然言語アノテーションを使用して、PVMの微調整のためのパラメトリックデータ選択を行う。 VeCAFは微調整の目的を取り入れて重要なデータポイントを選択し、PVMをより高速な収束に向けて効果的に導く。 ImageNetでは、VeCAFは最大3.3倍のトレーニングバッチを使用して、完全な微調整に比べて目標のパフォーマンスに到達する。
論文参考訳（メタデータ） (2024-01-15T17:28:37Z)
Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer [12.398902878803034]
本稿では,運転動作の分析を目的としたGAF-ViTモデルを提案する。提案したViTモデルは、Transformer Module、Channel Attention Module、Multi-Channel ViT Moduleの3つの主要なコンポーネントで構成されている。
論文参考訳（メタデータ） (2023-10-21T04:24:30Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
本研究では,新しい運転環境に3次元物体検出器を適応させる手法を提案する。提案手法は,空間的量子化履歴特徴を用いたLiDARに基づく検出モデルを強化する。実世界のデータセットの実験では、大幅な改善が示されている。
論文参考訳（メタデータ） (2023-09-21T15:00:31Z)
Cross-Domain Car Detection Model with Integrated Convolutional Block Attention Mechanism [3.3843451892622576]
統合畳み込みブロックアテンション機構を用いたクロスドメイン車目標検出モデルを提案する。実験の結果,我々のフレームワークを使わずに,モデルの性能が40%向上したことがわかった。
論文参考訳（メタデータ） (2023-05-31T17:28:13Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
将来を予測する能力を備えたモデルを提供し、ストリーミング知覚の結果を大幅に改善する。本稿では,複数の速度を駆動するシーンについて考察し,VasAP(Velocity-Awared streaming AP)を提案する。本手法は,Argoverse-HDデータセットの最先端性能を実現し,SAPとVsAPをそれぞれ4.7%,VsAPを8.2%改善する。
論文参考訳（メタデータ） (2022-07-21T12:03:02Z)
Enhancing Object Detection for Autonomous Driving by Optimizing Anchor Generation and Addressing Class Imbalance [0.0]
本研究では,より高速なR-CNNに基づく拡張型2次元物体検出器を提案する。より高速なr-cnnに対する修正は計算コストを増加させず、他のアンカーベースの検出フレームワークを最適化するために容易に拡張できる。
論文参考訳（メタデータ） (2021-04-08T16:58:31Z)
Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification [53.6218051770131]
クロスビューの一貫した機能表現は、正確な車両ReIDの鍵です。既存のアプローチは、広範な余分な視点アノテーションを使用して、クロスビュー学習を監督する。 Weakly-supervised Cross-View Learning (WCVL) モジュールを車載用として提案する。
論文参考訳（メタデータ） (2021-03-09T11:51:09Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。