Related papers: Occlusion-Resistant Instance Segmentation of Piglets in Farrowing Pens Using Center Clustering Network

Occlusion-Resistant Instance Segmentation of Piglets in Farrowing Pens Using Center Clustering Network

URL: http://arxiv.org/abs/2206.01942v1
Date: Sat, 4 Jun 2022 08:43:30 GMT
Title: Occlusion-Resistant Instance Segmentation of Piglets in Farrowing Pens Using Center Clustering Network
Authors: Endai Huang, Axiu Mao, Yongjian Wu, Haiming Gan, Maria Camila Ceballos, Thomas D. Parsons, Junhui Hou, Kai Liu
Abstract summary: We propose a novel Center Clustering Network for instance segmentation, dubbed as CClusnet-Inseg. CClusnet-Inseg uses each pixel to predict object centers and trace these centers to form masks based on clustering results. In all, 4,600 images were extracted from six videos collected from six farrowing pens to train and validate our method.
Score: 48.42863035798351
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computer vision enables the development of new approaches to monitor the behavior, health, and welfare of animals. Instance segmentation is a high-precision method in computer vision for detecting individual animals of interest. This method can be used for in-depth analysis of animals, such as examining their subtle interactive behaviors, from videos and images. However, existing deep-learning-based instance segmentation methods have been mostly developed based on public datasets, which largely omit heavy occlusion problems; therefore, these methods have limitations in real-world applications involving object occlusions, such as farrowing pen systems used on pig farms in which the farrowing crates often impede the sow and piglets. In this paper, we propose a novel occlusion-resistant Center Clustering Network for instance segmentation, dubbed as CClusnet-Inseg. Specifically, CClusnet-Inseg uses each pixel to predict object centers and trace these centers to form masks based on clustering results, which consists of a network for segmentation and center offset vector map, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, Centers-to-Mask (C2M) and Remain-Centers-to-Mask (RC2M) algorithms, and a pseudo-occlusion generator (POG). In all, 4,600 images were extracted from six videos collected from six farrowing pens to train and validate our method. CClusnet-Inseg achieves a mean average precision (mAP) of 83.6; it outperformed YOLACT++ and Mask R-CNN, which had mAP values of 81.2 and 74.7, respectively. We conduct comprehensive ablation studies to demonstrate the advantages and effectiveness of core modules of our method. In addition, we apply CClusnet-Inseg to multi-object tracking for animal monitoring, and the predicted object center that is a conjunct output could serve as an occlusion-resistant representation of the location of an object.

Related papers

You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping [119.41166438439313]
YOEO is a single-stage method that outputs instance segmentation and NPCS representations in an end-to-end manner.<n>We use a unified network to generate point-wise semantic labels and centroid offsets, allowing points from the same part instance to vote for the same centroid.<n>We also deploy our synthetically-trained model in a real-world setting, providing real-time visual feedback at 200Hz.
arXiv Detail & Related papers (2025-06-06T03:49:20Z)
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image [86.75098349480014]
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks. We propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts. We introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring RGB images and depth noise simulated from real sensors.
arXiv Detail & Related papers (2025-04-15T14:30:26Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Multi-spectral Class Center Network for Face Manipulation Detection and Localization [52.569170436393165]
We propose a novel Multi-Spectral Class Center Network (MSCCNet) for face manipulation detection and localization. Based on the features of different frequency bands, the MSCC module collects multi-spectral class centers and computes pixel-to-class relations. Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts which is insensitive to manipulated regions of forgery images.
arXiv Detail & Related papers (2023-05-18T08:09:20Z)
Robust Representation Learning by Clustering with Bisimulation Metrics for Visual Reinforcement Learning with Distractions [9.088460902782547]
Clustering with Bisimulation Metrics (CBM) learns robust representations by grouping visual observations in the latent space. CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments. Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms.
arXiv Detail & Related papers (2023-02-12T13:27:34Z)
Beyond the Prototype: Divide-and-conquer Proxies for Few-shot Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples. We propose a simple yet versatile framework in the spirit of divide-and-conquer. Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z)
Multiscale Convolutional Transformer with Center Mask Pretraining for Hyperspectral Image Classificationtion [14.33259265286265]
We propose a noval multi-scale convolutional embedding module for hyperspectral images (HSI) to realize effective extraction of spatial-spectral information. Similar to Mask autoencoder, but our pre-training method only masks the corresponding token of the central pixel in the encoder, and inputs the remaining token into the decoder to reconstruct the spectral information of the central pixel.
arXiv Detail & Related papers (2022-03-09T14:42:26Z)
SMAC-Seg: LiDAR Panoptic Segmentation via Sparse Multi-directional Attention Clustering [1.1470070927586016]
We present a learnable sparse multi-directional attention clustering to segment multi-scale foreground instances. SMAC-Seg is a real-time clustering-based approach, which removes the complex proposal network to segment instances. Our experimental results show that SMAC-Seg achieves state-of-the-art performance among all real-time deployable networks.
arXiv Detail & Related papers (2021-08-31T02:25:01Z)
DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA) Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution. Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z)
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion. Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders. In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z)
Joint Object Contour Points and Semantics for Instance Segmentation [1.2117737635879038]
We propose Mask Point R-CNN aiming at promoting the neural network's attention to the object boundary. Specifically, we innovatively extend the original human keypoint detection task to the contour point detection of any object. As a consequence, the model will be more sensitive to the edges of the object and can capture more geometric features.
arXiv Detail & Related papers (2020-08-02T11:11:28Z)
Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features [60.20150317299749]
Multi-channel deep clustering (MDC) has acquired a good performance for speech separation. We propose a deep attention fusion method to dynamically control the weights of the spectral and spatial features and combine them deeply. Experimental results show that the proposed method outperforms MDC baseline and even better than the ideal binary mask (IBM)
arXiv Detail & Related papers (2020-02-05T03:49:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.