Fugu-MT 論文翻訳(概要): An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

論文の概要: An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

arxiv url: http://arxiv.org/abs/2606.05149v1
Date: Wed, 03 Jun 2026 17:53:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.945864
Title: An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers
Title（参考訳）: ビジョン変換器を用いた細粒度車両分類のためのオープンソースの2段階コンピュータビジョンパイプライン
Authors: Gandhimathi Padmanaban, Fred Feng,
Abstract要約: 車両のボディタイプは、クラッシュを乗り越える際のサイクリストの重傷の重症度を決定づける重要な要因であるが、車両を分類するための自動ツールがオープンな文献には存在しない。本稿では、未学習のRT-DETR検出器と微調整の視覚変換器を組み合わせたオープンソースの2段コンピュータビジョンパイプラインを提案する。信頼に基づく棄権機構は、ソフトマックス出力が0.60未満になるとステージ2の予測を保ち、無音の誤分類ではなく未知のラベルを生成する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vehicle body type is a significant determinant of cyclist injury severity in overtaking crashes, yet automated tools for classifying vehicles into injury-risk-relevant categories from naturalistic roadway video do not exist in the open literature. Standard object detection benchmarks provide only coarse vehicle labels (car, truck, bus, motorcycle), while existing fine-grained recognition systems are trained on controlled imagery and lack evaluation for deployment robustness across recording sites. This paper presents an open-source two-stage computer vision pipeline combining a pre-trained RT-DETR detector for coarse vehicle localization with a fine-tuned Vision Transformer (ViT-Base/16) for six-category body-type classification: passenger car, SUV, pickup truck, minivan, large van, and commercial truck. A confidence-based abstention mechanism withholds Stage 2 predictions when softmax output falls below 0.60, producing unknown labels rather than silent misclassifications. Evaluated on 3,805 annotated overtaking events from a bicycle-lane corridor in Ann Arbor, Michigan (in-distribution), the pipeline achieved 0.94 accuracy with per-class F1 scores from 0.91 (minivan) to 0.97 (SUV). On an independent out-of-distribution evaluation of 311 events from an open cycling dataset without retraining, accuracy was 0.89. Three of four well-represented categories maintained F1 at or above 0.90 under domain shift. The largest degradation was observed for minivan (F1 = 0.72), driven by abstention rate rising from 2.4% to 25.0% rather than active misclassification, consistent with the mechanism propagating genuine model uncertainty. The full pipeline, including inference scripts, training code, evaluation utilities, and model weights, is released as open-source software to support reproducibility and reuse across roadside video archives and cycling safety research.
Abstract（参考訳）: 自動車のボディタイプは、事故を乗り越える際のサイクリストの重傷の重症度を決定づける重要な要因であるが、自然主義的な道路ビデオから車両を傷害リスク関連カテゴリーに分類する自動化ツールは、オープンな文献には存在しない。標準オブジェクト検出ベンチマークは、粗い車両ラベル(車、トラック、バス、オートバイ)のみを提供するが、既存の微粒化認識システムは、制御された画像に基づいて訓練されており、記録された場所間でのデプロイメントの堅牢性の評価が欠如している。本稿では,未学習のRT-DETR検出器と微調整のビジョントランスフォーマ(ViT-Base/16)を組み合わせたオープンソースの2段階コンピュータビジョンパイプラインを提案する。信頼に基づく棄権機構は、ソフトマックス出力が0.60未満になるとステージ2の予測を保ち、無音の誤分類ではなく未知のラベルを生成する。ミシガン州アン・アーバー(英語版)の自転車車線回廊(英語版)の3,805点(流通中)で、F1クラス毎のスコアが0.91点(ミニバン)から0.97点(SUV)まで0.94点(精度は0.94点)に達した。再トレーニングを行わないオープンサイクリングデータセットから311イベントを独立にアウト・オブ・ディストリビューション評価したところ,精度は0.89。 4つのよく表現されたカテゴリーのうち3つはドメインシフトの下でF1を0.90以上維持した。最も大きな劣化がミニバン (F1 = 0.72) で観測され、活性的誤分類よりも吸収率が2.4%から25.0%に上昇した。推論スクリプト、トレーニングコード、評価ユーティリティ、モデルウェイトを含む完全なパイプラインは、ロードサイドのビデオアーカイブとサイクリング安全研究の再現性と再利用をサポートするオープンソースソフトウェアとしてリリースされた。

論文の概要: An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

関連論文リスト