Related papers: Real-Time Cattle Interaction Recognition via Triple-stream Network

Real-Time Cattle Interaction Recognition via Triple-stream Network

URL: http://arxiv.org/abs/2209.02241v1
Date: Tue, 6 Sep 2022 06:31:09 GMT
Title: Real-Time Cattle Interaction Recognition via Triple-stream Network
Authors: Yang Yang, Mizuka Komatsu, Kenji Oyama, Takenao Ohkawa
Abstract summary: Cattle localization network outputs high-quality interaction proposals from every detected cattle. Interaction recognition network feeds them into the interaction recognition network with a triple-stream architecture.
Score: 3.3843451892622576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In stockbreeding of beef cattle, computer vision-based approaches have been widely employed to monitor cattle conditions (e.g. the physical, physiology, and health). To this end, the accurate and effective recognition of cattle action is a prerequisite. Generally, most existing models are confined to individual behavior that uses video-based methods to extract spatial-temporal features for recognizing the individual actions of each cattle. However, there is sociality among cattle and their interaction usually reflects important conditions, e.g. estrus, and also video-based method neglects the real-time capability of the model. Based on this, we tackle the challenging task of real-time recognizing interactions between cattle in a single frame in this paper. The pipeline of our method includes two main modules: Cattle Localization Network and Interaction Recognition Network. At every moment, cattle localization network outputs high-quality interaction proposals from every detected cattle and feeds them into the interaction recognition network with a triple-stream architecture. Such a triple-stream network allows us to fuse different features relevant to recognizing interactions. Specifically, the three kinds of features are a visual feature that extracts the appearance representation of interaction proposals, a geometric feature that reflects the spatial relationship between cattle, and a semantic feature that captures our prior knowledge of the relationship between the individual action and interaction of cattle. In addition, to solve the problem of insufficient quantity of labeled data, we pre-train the model based on self-supervised learning. Qualitative and quantitative evaluation evidences the performance of our framework as an effective method to recognize cattle interaction in real time.

Related papers

Interaction-via-Actions: Cattle Interaction Detection with Joint Learning of Action-Interaction Latent Space [18.635930702079563]
This paper introduces a method and application for automatically detecting behavioral interactions between grazing cattle from a single image.<n>We propose CattleAct, a data-efficient method for interaction detection by decomposing interactions into the combinations of actions by individual cattle.<n>On top of the proposed method, we develop a practical working system integrating video and GPS inputs.
arXiv Detail & Related papers (2025-12-18T03:42:54Z)
Beyond Proximity: A Keypoint-Trajectory Framework for Classifying Affiliative and Agonistic Social Networks in Dairy Cattle [0.764671395172401]
We present a pose-based computational framework for the classification of interaction in a commercial dairy barn.<n>Rather than relying on pixel-level appearance or simple distance measures, the proposed method encodes interaction motion signatures from keypoint trajectories.<n>The results establish a proof-of-concept for automated, vision-based inference of social interactions suitable for constructing interaction-aware social networks.
arXiv Detail & Related papers (2025-12-17T01:01:51Z)
Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition [5.45546363077543]
Cattle-CLIP is a multimodal deep learning framework for cattle behaviour recognition.<n>It is adapted from the large-scale image-language model CLIP by adding a temporal integration module.<n>Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in a supervised setting.
arXiv Detail & Related papers (2025-10-10T09:43:12Z)
Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding. Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction. Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z)
Joint Engagement Classification using Video Augmentation Techniques for Multi-person Human-robot Interaction [22.73774398716566]
We present a novel framework for identifying a parent-child dyad's joint engagement. Using a dataset of parent-child dyads reading storybooks together with a social robot at home, we first train RGB frame- and skeleton-based joint engagement recognition models. Second, we demonstrate experimental results on the use of trained models in the robot-parent-child interaction context.
arXiv Detail & Related papers (2022-12-28T23:52:55Z)
Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage [0.9937939233206224]
We introduce a feature-sharing approach that jointly the streams at regular intervals throughout the network. We achieve a prediction accuracy of 86.47% using an ensemble of Inception-based networks. Future work will investigate the effectiveness of sharing in behavioural classification in the unsupervised anomaly detection domain.
arXiv Detail & Related papers (2022-06-01T16:32:25Z)
Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data. Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z)
Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input. We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture. The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z)
Beyond Tracking: Using Deep Learning to Discover Novel Interactions in Biological Swarms [3.441021278275805]
We propose training deep network models to predict system-level states directly from generic graphical features from the entire view. Because the resulting predictive models are not based on human-understood predictors, we use explanatory modules. This represents an example of augmented intelligence in behavioral ecology -- knowledge co-creation in a human-AI team.
arXiv Detail & Related papers (2021-08-20T22:50:41Z)
Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene. ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z)
Attention-Oriented Action Recognition for Real-Time Human-Robot Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction. Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution. The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z)
Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. Our network predicts interaction points, which directly localize and classify the inter-action. Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.