Detection and Identification of Penguins Using Appearance and Motion Features
- URL: http://arxiv.org/abs/2603.03603v1
- Date: Wed, 04 Mar 2026 00:25:55 GMT
- Title: Detection and Identification of Penguins Using Appearance and Motion Features
- Authors: Kasumi Seko, Hiroki Kinoshita, Raj Rajeshwar Malinda, Hiroaki Kawashima,
- Abstract summary: We propose a framework that enhances both detection and identification performance by integrating appearance and motion features.<n>For detection, we adapted YOLO11 to process consecutive frames to overcome the lack of temporal consistency in single-frame detectors.<n>For identification, we introduce a tracklet-based contrastive learning approach applied after tracking.
- Score: 2.7123995549185325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In animal facilities, continuous surveillance of penguins is essential yet technically challenging due to their homogeneous visual characteristics, rapid and frequent posture changes, and substantial environmental noise such as water reflections. In this study, we propose a framework that enhances both detection and identification performance by integrating appearance and motion features. For detection, we adapted YOLO11 to process consecutive frames to overcome the lack of temporal consistency in single-frame detectors. This approach leverages motion cues to detect targets even when distinct visual features are obscured. Our evaluation shows that fine-tuning the model with two-frame inputs improves mAP@0.5 from 0.922 to 0.933, outperforming the baseline, and successfully recovers individuals that are indistinguishable in static images. For identification, we introduce a tracklet-based contrastive learning approach applied after tracking. Through qualitative visualization, we demonstrate that the method produces coherent feature embeddings, bringing samples from the same individual closer in the feature space, suggesting the potential for mitigating ID switching.
Related papers
- Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent [58.90049897180927]
We introduce an automated framework for detecting unintended reliance on visual features in vision models.<n>A self-reflective agent generates and tests hypotheses about visual attributes that a model may rely on.<n>We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies.
arXiv Detail & Related papers (2025-10-24T17:59:02Z) - Rethinking Contrastive Learning in Graph Anomaly Detection: A Clean-View Perspective [54.605073936695575]
Graph anomaly detection aims to identify unusual patterns in graph-based data, with wide applications in fields such as web security and financial fraud detection.<n>Existing methods rely on contrastive learning, assuming that a lower similarity between a node and its local subgraph indicates abnormality.<n>The presence of interfering edges invalidates this assumption, since it introduces disruptive noise that compromises the contrastive learning process.<n>We propose a Clean-View Enhanced Graph Anomaly Detection framework (CVGAD), which includes a multi-scale anomaly awareness module to identify key sources of interference in the contrastive learning process.
arXiv Detail & Related papers (2025-05-23T15:05:56Z) - An Active Inference Model of Covert and Overt Visual Attention [0.0]
This paper introduces a model of covert and overt visual attention through the framework of active inference.<n>The model determines visual sensory precisions based on both current environmental beliefs and sensory input.
arXiv Detail & Related papers (2025-05-06T09:26:00Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Learning to search for and detect objects in foveal images using deep
learning [3.655021726150368]
This study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image.
The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene.
We present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks.
arXiv Detail & Related papers (2023-04-12T09:50:25Z) - Spatiotemporal Multi-scale Bilateral Motion Network for Gait Recognition [3.1240043488226967]
In this paper, motivated by optical flow, the bilateral motion-oriented features are proposed.
We develop a set of multi-scale temporal representations that force the motion context to be richly described at various levels of temporal resolution.
arXiv Detail & Related papers (2022-09-26T01:36:22Z) - Automated Mobility Context Detection with Inertial Signals [7.71058263701836]
The primary goal of this paper is the investigation of context detection for remote monitoring of daily motor functions.
We aim to understand whether inertial signals sampled with wearable accelerometers, provide reliable information to classify gait-related activities as either indoor or outdoor.
arXiv Detail & Related papers (2022-05-16T09:34:43Z) - ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z) - Persistent Animal Identification Leveraging Non-Visual Markers [71.14999745312626]
We aim to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time.
This is a very challenging problem due to (i) the lack of distinguishing visual features for each mouse, and (ii) the close confines of the scene with constant occlusion.
Our approach achieves 77% accuracy on this animal identification problem, and is able to reject spurious detections when the animals are hidden.
arXiv Detail & Related papers (2021-12-13T17:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.