YOLOA: Real-Time Affordance Detection via LLM Adapter
- URL: http://arxiv.org/abs/2512.03418v1
- Date: Wed, 03 Dec 2025 03:53:31 GMT
- Title: YOLOA: Real-Time Affordance Detection via LLM Adapter
- Authors: Yuqi Ji, Junjie Ke, Lihuo He, Jun Liu, Kaifan Zhang, Yu-Kun Lai, Guiguang Ding, Xinbo Gao,
- Abstract summary: Affordance detection aims to jointly address the fundamental "what-where-how" challenge in embodied AI.<n>We introduce YOLO Affordance (YOLOA), a real-time affordance detection model that jointly handles object detection and affordance learning.<n>Experiments on our relabeled ADG-Det and IIT-Heat benchmarks demonstrate that YOLOA achieves state-of-the-art accuracy while maintaining real-time performance.
- Score: 96.61111291833544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Affordance detection aims to jointly address the fundamental "what-where-how" challenge in embodied AI by understanding "what" an object is, "where" the object is located, and "how" it can be used. However, most affordance learning methods focus solely on "how" objects can be used while neglecting the "what" and "where" aspects. Other affordance detection methods treat object detection and affordance learning as two independent tasks, lacking effective interaction and real-time capability. To overcome these limitations, we introduce YOLO Affordance (YOLOA), a real-time affordance detection model that jointly handles these two tasks via a large language model (LLM) adapter. Specifically, YOLOA employs a lightweight detector consisting of object detection and affordance learning branches refined through the LLM Adapter. During training, the LLM Adapter interacts with object and affordance preliminary predictions to refine both branches by generating more accurate class priors, box offsets, and affordance gates. Experiments on our relabeled ADG-Det and IIT-Heat benchmarks demonstrate that YOLOA achieves state-of-the-art accuracy (52.8 / 73.1 mAP on ADG-Det / IIT-Heat) while maintaining real-time performance (up to 89.77 FPS, and up to 846.24 FPS for the lightweight variant). This indicates that YOLOA achieves an excellent trade-off between accuracy and efficiency.
Related papers
- YOLO-DS: Fine-Grained Feature Decoupling via Dual-Statistic Synergy Operator for Object Detection [55.58092342624062]
We propose YOLO-DS, a framework built around a novel Dual-Statistic Synergy Operator (DSO)<n>YOLO-DS decouples object features by jointly modeling the channel-wise mean and the peak-to-mean difference.<n>On the MS-COCO benchmark, YOLO-DS consistently outperforms YOLOv8 across five model scales.
arXiv Detail & Related papers (2026-01-26T05:50:32Z) - YOLO-IOD: Towards Real Time Incremental Object Detection [57.862742461237055]
We introduce YOLO-IOD, a real-time Incremental Object Detection (IOD) framework that is constructed upon the pretrained YOLO-World model.<n>YOLO-IOD encompasses three principal components: 1) Conflict-Aware Pseudo-Label Refinement (CPR), which mitigates the foreground-background confusion.<n>We also introduce Cross-Stage Asymmetric Knowledge Distillation (CAKD), which addresses the misaligned knowledge distillation conflict.
arXiv Detail & Related papers (2025-12-28T15:35:26Z) - Anomaly-Aware YOLO: A Frugal yet Robust Approach to Infrared Small Target Detection [1.9116784879310027]
Anomaly-Aware YOLO (AA-YOLO) integrates a statistical anomaly detection test into its detection head.<n>By treating small targets as unexpected patterns against the background, AA-YOLO effectively controls the false alarm rate.
arXiv Detail & Related papers (2025-10-06T12:13:56Z) - Teach YOLO to Remember: A Self-Distillation Approach for Continual Object Detection [5.6148728159802035]
Real-time object detectors like YOLO achieve exceptional performance when trained on large datasets for multiple epochs.<n>In real-world scenarios where data arrives incrementally, neural networks suffer from catastrophic forgetting.<n>We introduce YOLO LwF, a self-distillation approach tailored for YOLO-based continual object detection.
arXiv Detail & Related papers (2025-03-06T18:31:41Z) - CLDA-YOLO: Visual Contrastive Learning Based Domain Adaptive YOLO Detector [10.419327930845922]
Unsupervised domain adaptive (UDA) algorithms can markedly enhance the performance of object detectors under conditions of domain shifts.<n>We present an unsupervised domain adaptive YOLO detector based on visual contrastive learning (CLDA-YOLO)
arXiv Detail & Related papers (2024-12-16T14:25:52Z) - YOLO-World: Real-Time Open-Vocabulary Object Detection [87.08732047660058]
We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities.
Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency.
YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-01-30T18:59:38Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [63.36722419180875]
We provide an efficient and performant object detector, termed YOLO-MS.<n>We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.<n>Our work can also serve as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - A lightweight and accurate YOLO-like network for small target detection
in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection.
YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation.
YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.