Related papers: When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection

When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection

URL: http://arxiv.org/abs/2509.06427v2
Date: Mon, 13 Oct 2025 09:28:02 GMT
Title: When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection
Authors: Rabin Dulal, Lihong Zheng, Muhammad Ashad Kabir,
Abstract summary: Grounding DINO is a vision-language model capable of detecting muzzles without task-specific training or annotated data.<n>Our model achieves a mean Average Precision (mAP)@0.5 of 76.8%, demonstrating promising performance without requiring annotated data.
Score: 0.48429188360918735
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Muzzle patterns are among the most effective biometric traits for cattle identification. Fast and accurate detection of the muzzle region as the region of interest is critical to automatic visual cattle identification.. Earlier approaches relied on manual detection, which is labor-intensive and inconsistent. Recently, automated methods using supervised models like YOLO have become popular for muzzle detection. Although effective, these methods require extensive annotated datasets and tend to be trained data-dependent, limiting their performance on new or unseen cattle. To address these limitations, this study proposes a zero-shot muzzle detection framework based on Grounding DINO, a vision-language model capable of detecting muzzles without any task-specific training or annotated data. This approach leverages natural language prompts to guide detection, enabling scalable and flexible muzzle localization across diverse breeds and environments. Our model achieves a mean Average Precision (mAP)@0.5 of 76.8\%, demonstrating promising performance without requiring annotated data. To our knowledge, this is the first research to provide a real-world, industry-oriented, and annotation-free solution for cattle muzzle detection. The framework offers a practical alternative to supervised methods, promising improved adaptability and ease of deployment in livestock monitoring applications.

Related papers

Automated Re-Identification of Holstein-Friesian Cattle in Dense Crowds [2.3843187053931456]
We propose a new detect-segment-identify pipeline that leverages the Open-Vocabulary Weight-free Localisation and the Segment Anything models.<n>Our methodology overcomes detection breakdown in dense animal groupings, resulting in a 98.93% accuracy.<n>We show that unsupervised contrastive learning can build on this to yield 94.82% Re-ID accuracy on our test data.
arXiv Detail & Related papers (2026-02-17T19:25:50Z)
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models [60.713908578319256]
We propose Direct Discrepancy Learning (DDL) to optimize the detector with task-oriented knowledge.<n>Built upon this, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance.<n>MIRAGE samples human-written texts from 10 corpora across 5 text-domains, which are then re-generated or revised using 17 cutting-edge LLMs.
arXiv Detail & Related papers (2025-09-15T10:59:57Z)
Learning Using Privileged Information for Litter Detection [0.6390468088226494]
This study presents a novel approach that combines privileged information with deep learning object detection.<n>We evaluate our method across five widely used object detection models.<n>Our results suggest that this methodology offers a practical solution for litter detection.
arXiv Detail & Related papers (2025-08-06T06:46:14Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [81.93945602120453]
We introduce an approach that is both general and parameter-efficient for face forgery detection.<n>We design a forgery-style mixture formulation that augments the diversity of forgery source domains.<n>We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
Detect Any Deepfakes: Segment Anything Meets Face Forgery Detection and Localization [30.317619885984005]
We introduce the well-trained vision segmentation foundation model, i.e., Segment Anything Model (SAM) in face forgery detection and localization. Based on SAM, we propose the Detect Any Deepfakes (DADF) framework with the Multiscale Adapter. The proposed framework seamlessly integrates end-to-end forgery localization and detection optimization.
arXiv Detail & Related papers (2023-06-29T16:25:04Z)
Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications. We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z)
Augment and Criticize: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework. We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data. The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z)
MAPS: A Noise-Robust Progressive Learning Approach for Source-Free Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation. This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z)
Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax [88.11979569564427]
We provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution. We propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training. Extensive experiments on the very recent long-tail large vocabulary object recognition benchmark LVIS show that our proposed BAGS significantly improves the performance of detectors.
arXiv Detail & Related papers (2020-06-18T10:24:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.