Related papers: 8-Calves Image dataset

Related papers

A Domain-Adapted Lightweight Ensemble for Resource-Efficient Few-Shot Plant Disease Classification [0.0]
We present a few-shot learning approach that combines domain-adapted MobileNetV2 and MobileNetV3 models as feature extractors.<n>For the classification task, the fused features are passed through a Bi-LSTM classifier enhanced with attention mechanisms.<n>It consistently improved performance across 1 to 15 shot scenarios, reaching 98.23+-0.33% at 15 shot.<n> Notably, it also outperformed the previous SOTA accuracy of 96.4% on six diseases from PlantVillage, achieving 99.72% with only 15-shot learning.
arXiv Detail & Related papers (2025-12-15T15:17:29Z)
Silhouette-based Gait Foundation Model [56.27974816297294]
Building a unified gait foundation model requires addressing two longstanding barriers: Scalability and Generalization.<n>We introduce FoundationGait, the first scalable, self-supervised pretraining framework for gait understanding.
arXiv Detail & Related papers (2025-11-30T01:53:41Z)
A Comprehensive Evaluation of YOLO-based Deer Detection Performance on Edge Devices [6.486957474966142]
The escalating economic losses in agriculture due to deer intrusion, estimated to be in the hundreds of millions of dollars annually in the U.S., highlight the inadequacy of traditional mitigation strategies.<n>This underscores a critical need for intelligent, autonomous solutions capable of real-time deer detection and deterrence.<n>This study presents a comprehensive evaluation of state-of-the-art deep learning models for deer detection in challenging real-world scenarios.
arXiv Detail & Related papers (2025-09-24T17:01:50Z)
Maize Seedling Detection Dataset (MSDD): A Curated High-Resolution RGB Dataset for Seedling Maize Detection and Benchmarking with YOLOv9, YOLO11, YOLOv12 and Faster-RCNN [0.28647133890966986]
Stand counting determines how many plants germinated, guiding timely decisions such as replanting or adjusting inputs.<n>We introduce MSDD, a high-quality aerial image dataset for maize seedling stand counting, with applications in early-season crop monitoring, yield prediction, and in-field management.<n> MSDD contains three classes-single, double, and triple plants-capturing diverse growth stages, planting setups, soil types, lighting conditions, camera angles, and densities, ensuring robustness for real-world use.
arXiv Detail & Related papers (2025-09-18T17:41:59Z)
What You Have is What You Track: Adaptive and Robust Multimodal Tracking [72.92244578461869]
We present the first comprehensive study on tracker performance with temporally incomplete multimodal data.<n>Our model achieves SOTA performance across 9 benchmarks, excelling in both conventional complete and missing modality settings.
arXiv Detail & Related papers (2025-07-08T11:40:21Z)
Practical Manipulation Model for Robust Deepfake Detection [55.2480439325792]
We develop a more real-world degradation model in the area of image super-resolution.<n>We extend the space of pseudo-fakes by using Poisson blending, more diverse masks, generator artifacts, and distractors.<n>We show clear increases of $3.51%$ and $6.21%$ AUC on the DFDC and DFDCP datasets, respectively.
arXiv Detail & Related papers (2025-06-05T15:06:16Z)
Event-Based Crossing Dataset (EBCD) [0.9961452710097684]
Event-based vision revolutionizes traditional image sensing by capturing intensity variations rather than static frames. Event-Based Crossing dataset is a dataset tailored for pedestrian and vehicle detection in dynamic outdoor environments. This dataset facilitates an extensive assessment of object detection performance under varying conditions of sparsity and noise suppression.
arXiv Detail & Related papers (2025-03-21T19:20:58Z)
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification [39.350429734981184]
We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID) This dataset comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras. We propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams.
arXiv Detail & Related papers (2025-03-11T07:38:01Z)
DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection [0.3818645814949463]
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup)<n>Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation.<n>The proposed system addresses the challenges of detecting deepfakes in real-world conditions and achieves a commendable accuracy of 95.83% on the validation dataset.
arXiv Detail & Related papers (2025-01-28T04:46:50Z)
YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions [0.0]
This study represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12.<n>The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class.<n>Our analysis highlights the distinctive strengths and limitations of each YOLO version.
arXiv Detail & Related papers (2024-10-31T20:45:00Z)
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens [53.99177152562075]
Scaling up autoregressive models in vision has not proven as beneficial as in large language models. We focus on two critical factors: whether models use discrete or continuous tokens, and whether tokens are generated in a random or fixed order using BERT- or GPT-like transformer architectures. Our results show that while all models scale effectively in terms of validation loss, their evaluation performance -- measured by FID, GenEval score, and visual quality -- follows different trends.
arXiv Detail & Related papers (2024-10-17T17:59:59Z)
Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10 [0.0]
This paper presents a comprehensive workflow for road damage detection using deep learning models. To accommodate hardware limitations, large images are cropped, and lightweight models are utilized. The proposed approach employs multiple model architectures, including a custom YOLOv7 model with Coordinate Attention layers and a Tiny YOLOv7 model.
arXiv Detail & Related papers (2024-10-10T22:55:12Z)
You Only Look at Once for Real-time and Generic Multi-Task [20.61477620156465]
A-YOLOM is an adaptive, real-time, and lightweight multi-task model. We develop an end-to-end multi-task model with a unified and streamlined segmentation structure. We achieve competitive results on the BDD100k dataset.
arXiv Detail & Related papers (2023-10-02T21:09:43Z)
DeepSeaNet: Improving Underwater Object Detection using EfficientDet [0.0]
This project involves implementing and evaluating various object detection models on an annotated underwater dataset. The dataset comprises annotated image sequences of fish, crabs, starfish, and other aquatic animals captured in Limfjorden water with limited visibility. I compare the results of YOLOv3 (31.10% mean Average Precision (mAP)), YOLOv4 (83.72% mAP), YOLOv5 (97.6%), YOLOv8 (98.20%), EfficientDet (98.56% mAP) and Detectron2 (95.20% mAP) on the same dataset.
arXiv Detail & Related papers (2023-05-26T13:41:35Z)
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images [59.51657161097337]
OOD-CV-v2 is a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods.
arXiv Detail & Related papers (2023-04-17T20:39:25Z)
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets. We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis. The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper. We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z)
Global Context Vision Transformers [78.5346173956383]
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision. We address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture. Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks.
arXiv Detail & Related papers (2022-06-20T18:42:44Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
LIGHTS: LIGHT Specularity Dataset for specular detection in Multi-view [12.612981566441908]
We propose a novel physically-based rendered LIGHT Specularity (SLIGHT) dataset for the evaluation of the specular highlight detection task. Our dataset consists of 18 high quality architectural scenes, where each scene is rendered with multiple views. In total we have 2,603 views with an average of 145 views per scene.
arXiv Detail & Related papers (2021-01-26T13:26:49Z)
Tracklets Predicting Based Adaptive Graph Tracking [51.352829280902114]
We present an accurate and end-to-end learning framework for multi-object tracking, namely textbfTPAGT. It re-extracts the features of the tracklets in the current frame based on motion predicting, which is the key to solve the problem of features inconsistent.
arXiv Detail & Related papers (2020-10-18T16:16:49Z)
The Devil is in Classification: A Simple Framework for Long-tail Object Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset. We unveil that a major cause is the inaccurate classification of object proposals. We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z)
Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR. Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking. Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)
End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
SVIRO: Synthetic Vehicle Interior Rear Seat Occupancy Dataset and Benchmark [11.101588888002045]
We release SVIRO, a synthetic dataset for sceneries in the passenger compartment of ten different vehicles. We analyze machine learning-based approaches for their generalization capacities and reliability when trained on a limited number of variations.
arXiv Detail & Related papers (2020-01-10T14:44:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.