Fugu-MT 論文翻訳(概要): Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

論文の概要: Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

arxiv url: http://arxiv.org/abs/2606.03748v1
Date: Tue, 02 Jun 2026 15:01:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:05.093852
Title: Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models
Title（参考訳）: Ultralytics YOLO26:Unified Real-Time End-to-End Vision Models
Authors: Glenn Jocher, Jing Qiu, Mengyu Liu, Shuai Lyu, Fatih Cagatay Akyon, Muhammet Esat Kalfaoglu,
Abstract要約: YOLO検出器は依然として推論時の非最大抑制に依存しており、Focal Distribution Lossによる重い検出ヘッドを持ち、正のラベルを割り当てることなく最小の物体を残すことができる。我々は,これらの制約に対処する統合リアルタイムビジョンモデルファミリーであるUltralytics YOLO26を提案する。
参考スコア（独自算出の注目度）: 6.526886874917011
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-time vision demands models that are accurate, efficient, and simple to deploy across diverse hardware. The YOLO family has become widely deployed for this reason, yet most YOLO detectors still rely on non-maximum suppression at inference, carry heavy detection heads due to Distribution Focal Loss, require long training schedules, and can leave the smallest objects without positive label assignments. We present Ultralytics YOLO26, a unified real-time vision model family that addresses these limitations through coordinated architecture and training advances. YOLO26 uses a dual-head design for native NMS-free end-to-end inference and removes DFL entirely, yielding a lighter head with unconstrained regression range. Its training pipeline combines MuSGD, a hybrid Muon-SGD optimizer adapted from large language model training; Progressive Loss, which shifts supervision toward the inference-time head; and STAL, a label assignment strategy that guarantees positive coverage for small objects. Beyond detection, YOLO26 introduces task-specific head and loss designs for instance segmentation, pose estimation, and oriented detection, producing consistent gains across tasks and scales. The family spans five scales (n/s/m/l/x) and supports detection, instance segmentation, pose estimation, classification, and oriented detection in a single pipeline, with an open-vocabulary extension, YOLOE-26, for text-, visual-, and prompt-free inference. Across all scales, YOLO26 achieves 40.9-57.5 mAP on COCO at 1.7-11.8 ms T4 TensorRT latency, advancing the accuracy-latency Pareto front over prior real-time detectors, while YOLOE-26x reaches 40.6 AP on LVIS minival under text prompting. Code and models are available at https://github.com/ultralytics/ultralytics.
Abstract（参考訳）: リアルタイムビジョンは、さまざまなハードウェアにまたがる、正確で効率的で、デプロイが容易なモデルを必要とする。そのため、YOLOファミリーは広く展開されているが、ほとんどのYOLO検出器は推論時の非最大抑制に依存しており、分散焦点損失による重い検出ヘッドを持ち、長いトレーニングスケジュールを必要とし、ラベルの割り当てなしに最小のオブジェクトを残すことができる。我々は,これらの制約に対処する統合リアルタイムビジョンモデルファミリーであるUltralytics YOLO26を提案する。 YOLO26はNMSのないネイティブなエンドツーエンド推論のためにデュアルヘッド設計を採用し、DFLを完全に取り除き、制約のない回帰範囲を持つ軽量なヘッドを提供する。トレーニングパイプラインには、大規模な言語モデルのトレーニングに適応したMuon-SGD最適化ツールであるMuSGD、推論時のヘッドに監督をシフトするProgressive Loss、小さなオブジェクトに対して肯定的なカバレッジを保証するラベル割り当て戦略であるSTALが組み込まれている。 YOLO26は、検出以外にも、インスタンスセグメンテーション、ポーズ推定、方向検出のためのタスク固有のヘッドとロスの設計を導入し、タスクとスケールの一貫性のあるゲインを生み出している。このファミリーは5つのスケール(n/s/m/l/x)にまたがり、単一のパイプラインで検出、インスタンスのセグメンテーション、ポーズ推定、分類、方向検出をサポートする。全スケールにわたって、YOLO26はCOCO上の1.7-11.8ms T4TensorRTレイテンシで40.9-57.5mAPを達成し、従来のリアルタイム検出器よりも精度の高いParetoを前進させ、YOLOE-26xはテキストプロンプトの下でLVISミニバル上で40.6 APに達する。コードとモデルはhttps://github.com/ultralytics/ultralyticsで入手できる。

論文の概要: Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

関連論文リスト