3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations
- URL: http://arxiv.org/abs/2412.07168v1
- Date: Tue, 10 Dec 2024 04:01:32 GMT
- Title: 3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations
- Authors: Xuecheng Wu, Junxiao Xue, Liangyu Fu, Jiayu Nie, Danlei Huang, Xinyi Yin,
- Abstract summary: This work aims to leverage multiple attention mechanisms to hierarchically enhance the triple discriminative awareness of the YOLO detection head.<n>We first propose a new head denoted TDA-YOLO Module, which unifiedly enhance the representations learning of scale-awareness, spatial-awareness, and task-awareness.<n> Secondly, we steer the intermediate features to coordinately learn the inter-channel relationships and precise positional information.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research on real-time object detectors (e.g., YOLO series) has demonstrated the effectiveness of attention mechanisms for elevating model performance. Nevertheless, existing methods neglect to unifiedly deploy hierarchical attention mechanisms to construct a more discriminative YOLO head which is enriched with more useful intermediate features. To tackle this gap, this work aims to leverage multiple attention mechanisms to hierarchically enhance the triple discriminative awareness of the YOLO detection head and complementarily learn the coordinated intermediate representations, resulting in a new series detectors denoted 3A-YOLO. Specifically, we first propose a new head denoted TDA-YOLO Module, which unifiedly enhance the representations learning of scale-awareness, spatial-awareness, and task-awareness. Secondly, we steer the intermediate features to coordinately learn the inter-channel relationships and precise positional information. Finally, we perform neck network improvements followed by introducing various tricks to boost the adaptability of 3A-YOLO. Extensive experiments across COCO and VOC benchmarks indicate the effectiveness of our detectors.
Related papers
- YOLO-DS: Fine-Grained Feature Decoupling via Dual-Statistic Synergy Operator for Object Detection [55.58092342624062]
We propose YOLO-DS, a framework built around a novel Dual-Statistic Synergy Operator (DSO)<n>YOLO-DS decouples object features by jointly modeling the channel-wise mean and the peak-to-mean difference.<n>On the MS-COCO benchmark, YOLO-DS consistently outperforms YOLOv8 across five model scales.
arXiv Detail & Related papers (2026-01-26T05:50:32Z) - YOLO-IOD: Towards Real Time Incremental Object Detection [57.862742461237055]
We introduce YOLO-IOD, a real-time Incremental Object Detection (IOD) framework that is constructed upon the pretrained YOLO-World model.<n>YOLO-IOD encompasses three principal components: 1) Conflict-Aware Pseudo-Label Refinement (CPR), which mitigates the foreground-background confusion.<n>We also introduce Cross-Stage Asymmetric Knowledge Distillation (CAKD), which addresses the misaligned knowledge distillation conflict.
arXiv Detail & Related papers (2025-12-28T15:35:26Z) - YOLOA: Real-Time Affordance Detection via LLM Adapter [96.61111291833544]
Affordance detection aims to jointly address the fundamental "what-where-how" challenge in embodied AI.<n>We introduce YOLO Affordance (YOLOA), a real-time affordance detection model that jointly handles object detection and affordance learning.<n>Experiments on our relabeled ADG-Det and IIT-Heat benchmarks demonstrate that YOLOA achieves state-of-the-art accuracy while maintaining real-time performance.
arXiv Detail & Related papers (2025-12-03T03:53:31Z) - TACR-YOLO: A Real-time Detection Framework for Abnormal Human Behaviors Enhanced with Coordinate and Task-Aware Representations [0.0]
We propose TACR-YOLO, a new real-time framework for abnormal behavior detection under special scenarios.<n>We introduce a Coordinate Attention Module to enhance small object detection, a Task-Aware Attention Module to deal with classification-regression conflicts, and a Strengthen Neck Network for refined multi-scale fusion.<n>We also optimize Anchor Box sizes using K-means clustering and deploy DIoU-Loss to improve bounding box regression.
arXiv Detail & Related papers (2025-08-15T13:45:21Z) - DGE-YOLO: Dual-Branch Gathering and Attention for Accurate UAV Object Detection [0.46040036610482665]
We present DGE-YOLO, an enhanced YOLO-based detection framework designed to effectively fuse multi-modal information.<n> Specifically, we introduce a dual-branch architecture for modality-specific feature extraction, enabling the model to process both infrared and visible images.<n>To further enrich semantic representation, we propose an Efficient Multi-scale Attention (EMA) mechanism that enhances feature learning across spatial scales.
arXiv Detail & Related papers (2025-06-29T14:19:18Z) - YOLO-RS: Remote Sensing Enhanced Crop Detection Methods [0.32985979395737786]
Existing target detection methods show poor performance when dealing with small targets in remote sensing images.
YOLO-RS is based on the latest Yolov11 which significantly enhances the detection of small targets.
Experiments validate the effectiveness and application potential of YOLO-RS in the task of detecting small targets in remote sensing images.
arXiv Detail & Related papers (2025-04-15T13:13:22Z) - Evaluating and Improving Graph-based Explanation Methods for Multi-Agent Coordination [1.1137087573421256]
Graph Neural Networks (GNNs) have been adopted and shown to be highly effective in multi-robot and multi-agent learning.
We investigate and characterize the suitability of existing GNN explanation methods for explaining multi-agent coordination.
We propose an attention entropy regularization term that renders GAT-based policies more amenable to existing graph-based explainers.
arXiv Detail & Related papers (2025-02-14T03:25:45Z) - CLDA-YOLO: Visual Contrastive Learning Based Domain Adaptive YOLO Detector [10.419327930845922]
Unsupervised domain adaptive (UDA) algorithms can markedly enhance the performance of object detectors under conditions of domain shifts.
We present an unsupervised domain adaptive YOLO detector based on visual contrastive learning (CLDA-YOLO)
arXiv Detail & Related papers (2024-12-16T14:25:52Z) - Reframing the Relationship in Out-of-Distribution Detection [4.182518087792777]
We introduce a novel approach that integrates the agent paradigm into the Out-of-distribution (OOD) detection task.
Our proposed method, Concept Matching with Agent (CMA), employs neutral prompts as agents to augment the CLIP-based OOD detection process.
Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods.
arXiv Detail & Related papers (2024-05-27T02:27:28Z) - YOLO-World: Real-Time Open-Vocabulary Object Detection [87.08732047660058]
We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities.
Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency.
YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-01-30T18:59:38Z) - Spatio-Temporal Domain Awareness for Multi-Agent Collaborative
Perception [18.358998861454477]
Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the performance perception of autonomous vehicles over single-agent perception.
We propose SCOPE, a novel collaborative perception framework that aggregates awareness characteristics across agents in an end-to-end manner.
arXiv Detail & Related papers (2023-07-26T03:00:31Z) - Weakly-supervised HOI Detection via Prior-guided Bi-level Representation
Learning [66.00600682711995]
Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks.
One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only.
This is inherently challenging due to ambiguous human-object associations, large search space of detecting HOIs and highly noisy training signal.
We develop a CLIP-guided HOI representation capable of incorporating the prior knowledge at both image level and HOI instance level, and adopt a self-taught mechanism to prune incorrect human-object associations.
arXiv Detail & Related papers (2023-03-02T14:41:31Z) - Towards End-to-end Semi-supervised Learning for One-stage Object
Detection [88.56917845580594]
This paper focuses on the semi-supervised learning for the advanced and popular one-stage detection network YOLOv5.
We propose a novel teacher-student learning recipe called OneTeacher with two innovative designs, namely Multi-view Pseudo-label Refinement (MPR) and Decoupled Semi-supervised Optimization (DSO)
In particular, MPR improves the quality of pseudo-labels via augmented-view refinement and global-view filtering, and DSO handles the joint optimization conflicts via structure tweaks and task-specific pseudo-labeling.
arXiv Detail & Related papers (2023-02-22T11:35:40Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised
Video Representation Learning [6.523119805288132]
We present a novel technique for self-supervised video representation learning by: (a) decoupling the learning objective into two contrastive subtasks respectively emphasizing spatial and temporal features, and (b) performing it hierarchically to encourage multi-scale understanding.
arXiv Detail & Related papers (2020-11-23T08:05:39Z) - Adversarial Self-Supervised Learning for Semi-Supervised 3D Action
Recognition [123.62183172631443]
We present Adversarial Self-Supervised Learning (ASSL), a novel framework that tightly couples SSL and the semi-supervised scheme.
Specifically, we design an effective SSL scheme to improve the discrimination capability of learned representations for 3D action recognition.
arXiv Detail & Related papers (2020-07-12T08:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.