8-Calves Image dataset
- URL: http://arxiv.org/abs/2503.13777v2
- Date: Fri, 25 Apr 2025 22:31:53 GMT
- Title: 8-Calves Image dataset
- Authors: Xuyang Fang, Sion Hannuna, Neill Campbell,
- Abstract summary: 8-Calves is a benchmark for evaluating object detection and identity preservation in temporally consistent environments.<n>The dataset consists of a 1-hour video of eight Holstein Friesian calves with unique coat patterns and 900 static frames.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the 8-Calves dataset, a benchmark for evaluating object detection and identity preservation in occlusion-rich, temporally consistent environments. Comprising a 1-hour video (67,760 frames) of eight Holstein Friesian calves with unique coat patterns and 900 static frames, the dataset emphasizes real-world challenges like prolonged occlusions, motion blur, and pose variation. By fine-tuning 28 object detectors (YOLO variants, transformers) and evaluating 23 pretrained backbones (ResNet, ConvNextV2, ViTs), we expose critical architectural trade-offs: smaller models (e.g., ConvNextV2 Nano, 15.6M parameters) excel in efficiency and retrieval accuracy, while pure vision transformers lag in occlusion-heavy settings. The dataset's structured design-fixed camera views, natural motion, and verified identities-provides a reproducible testbed for object detection challenges (mAP50:95: 56.5-66.4%), bridging synthetic simplicity and domain-specific complexity. The dataset and benchmark code are all publicly available at https://huggingface.co/datasets/tonyFang04/8-calves. Limitations include partial labeling and detector bias, addressed in later sections.
Related papers
- Event-Based Crossing Dataset (EBCD) [0.9961452710097684]
Event-based vision revolutionizes traditional image sensing by capturing intensity variations rather than static frames.
Event-Based Crossing dataset is a dataset tailored for pedestrian and vehicle detection in dynamic outdoor environments.
This dataset facilitates an extensive assessment of object detection performance under varying conditions of sparsity and noise suppression.
arXiv Detail & Related papers (2025-03-21T19:20:58Z) - AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification [39.350429734981184]
We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID)
This dataset comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras.
We propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams.
arXiv Detail & Related papers (2025-03-11T07:38:01Z) - YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions [0.0]
This study represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12.<n>The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class.<n>Our analysis highlights the distinctive strengths and limitations of each YOLO version.
arXiv Detail & Related papers (2024-10-31T20:45:00Z) - Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens [53.99177152562075]
Scaling up autoregressive models in vision has not proven as beneficial as in large language models.
We focus on two critical factors: whether models use discrete or continuous tokens, and whether tokens are generated in a random or fixed order using BERT- or GPT-like transformer architectures.
Our results show that while all models scale effectively in terms of validation loss, their evaluation performance -- measured by FID, GenEval score, and visual quality -- follows different trends.
arXiv Detail & Related papers (2024-10-17T17:59:59Z) - Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10 [0.0]
This paper presents a comprehensive workflow for road damage detection using deep learning models.
To accommodate hardware limitations, large images are cropped, and lightweight models are utilized.
The proposed approach employs multiple model architectures, including a custom YOLOv7 model with Coordinate Attention layers and a Tiny YOLOv7 model.
arXiv Detail & Related papers (2024-10-10T22:55:12Z) - You Only Look at Once for Real-time and Generic Multi-Task [20.61477620156465]
A-YOLOM is an adaptive, real-time, and lightweight multi-task model.
We develop an end-to-end multi-task model with a unified and streamlined segmentation structure.
We achieve competitive results on the BDD100k dataset.
arXiv Detail & Related papers (2023-10-02T21:09:43Z) - OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution
Shifts of Individual Nuisances in Natural Images [59.51657161097337]
OOD-CV-v2 is a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions.
In addition to this novel dataset, we contribute extensive experiments using popular baseline methods.
arXiv Detail & Related papers (2023-04-17T20:39:25Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Global Context Vision Transformers [78.5346173956383]
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision.
We address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture.
Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks.
arXiv Detail & Related papers (2022-06-20T18:42:44Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - LIGHTS: LIGHT Specularity Dataset for specular detection in Multi-view [12.612981566441908]
We propose a novel physically-based rendered LIGHT Specularity (SLIGHT) dataset for the evaluation of the specular highlight detection task.
Our dataset consists of 18 high quality architectural scenes, where each scene is rendered with multiple views.
In total we have 2,603 views with an average of 145 views per scene.
arXiv Detail & Related papers (2021-01-26T13:26:49Z) - The Devil is in Classification: A Simple Framework for Long-tail Object
Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset.
We unveil that a major cause is the inaccurate classification of object proposals.
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - SVIRO: Synthetic Vehicle Interior Rear Seat Occupancy Dataset and
Benchmark [11.101588888002045]
We release SVIRO, a synthetic dataset for sceneries in the passenger compartment of ten different vehicles.
We analyze machine learning-based approaches for their generalization capacities and reliability when trained on a limited number of variations.
arXiv Detail & Related papers (2020-01-10T14:44:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.