CVB: A Video Dataset of Cattle Visual Behaviors
- URL: http://arxiv.org/abs/2305.16555v2
- Date: Mon, 3 Jul 2023 07:11:17 GMT
- Title: CVB: A Video Dataset of Cattle Visual Behaviors
- Authors: Ali Zia, Renuka Sharma, Reza Arablouei, Greg Bishop-Hurley, Jody
McNally, Neil Bagnall, Vivien Rolland, Brano Kusy, Lars Petersson, Aaron
Ingham
- Abstract summary: Existing datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments.
We introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle.
- Score: 13.233877352490923
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing image/video datasets for cattle behavior recognition are mostly
small, lack well-defined labels, or are collected in unrealistic controlled
environments. This limits the utility of machine learning (ML) models learned
from them. Therefore, we introduce a new dataset, called Cattle Visual
Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long,
captured in natural lighting conditions, and annotated with eleven visually
perceptible behaviors of grazing cattle. We use the Computer Vision Annotation
Tool (CVAT) to collect our annotations. To make the procedure more efficient,
we perform an initial detection and tracking of cattle in the videos using
appropriate pre-trained models. The results are corrected by domain experts
along with cattle behavior labeling in CVAT. The pre-hoc detection and tracking
step significantly reduces the manual annotation time and effort. Moreover, we
convert CVB to the atomic visual action (AVA) format and train and evaluate the
popular SlowFast action recognition model on it. The associated preliminary
results confirm that we can localize the cattle and recognize their frequently
occurring behaviors with confidence. By creating and sharing CVB, our aim is to
develop improved models capable of recognizing all important behaviors
accurately and to assist other researchers and practitioners in developing and
evaluating new ML models for cattle behavior classification using video data.
Related papers
- Accelerometer-Based Multivariate Time-Series Dataset for Calf Behavior Classification [0.7366868731714773]
This dataset is a ready-to-use dataset for classifying pre-weaned calf behaviour from the acceleration time series.
30 dairy calves were equipped with a 3D-accelerometer sensor attached to a neck-collar from one week of birth for 13 weeks.
arXiv Detail & Related papers (2024-08-20T08:11:54Z) - From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave [0.0]
We introduce ChimpBehave, a novel dataset featuring over 2 hours of video (approximately 193,000 video frames) of zoo-housed chimpanzees.
ChimpBehave meticulously annotated with bounding boxes and behavior labels for action recognition.
We benchmark our dataset using a state-of-the-art CNN-based action recognition model.
arXiv Detail & Related papers (2024-05-30T13:11:08Z) - Early Action Recognition with Action Prototypes [62.826125870298306]
We propose a novel model that learns a prototypical representation of the full action for each class.
We decompose the video into short clips, where a visual encoder extracts features from each clip independently.
Later, a decoder aggregates together in an online fashion features from all the clips for the final class prediction.
arXiv Detail & Related papers (2023-12-11T18:31:13Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts.
We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z) - Auxiliary Learning for Self-Supervised Video Representation via
Similarity-based Knowledge Distillation [2.6519061087638014]
We propose a novel approach to complement self-supervised pretraining via an auxiliary pretraining phase, based on knowledge similarity distillation, auxSKD.
Our method deploys a teacher network that iteratively distils its knowledge to the student model by capturing the similarity information between segments of unlabelled video data.
We also introduce a novel pretext task, Video Segment Pace Prediction or VSPP, which requires our model to predict the playback speed of a randomly selected segment of the input video to provide more reliable self-supervised representations.
arXiv Detail & Related papers (2021-12-07T21:50:40Z) - PreViTS: Contrastive Pretraining with Video Tracking Supervision [53.73237606312024]
PreViTS is an unsupervised SSL framework for selecting clips containing the same object.
PreViTS spatially constrains the frame regions to learn from and trains the model to locate meaningful objects.
We train a momentum contrastive (MoCo) encoder on VGG-Sound and Kinetics-400 datasets with PreViTS.
arXiv Detail & Related papers (2021-12-01T19:49:57Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.