AlphaChimp: Tracking and Behavior Recognition of Chimpanzees
- URL: http://arxiv.org/abs/2410.17136v2
- Date: Sun, 10 Nov 2024 13:45:57 GMT
- Title: AlphaChimp: Tracking and Behavior Recognition of Chimpanzees
- Authors: Xiaoxuan Ma, Yutang Lin, Yuan Xu, Stephan P. Kaufhold, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang,
- Abstract summary: We develop an end-to-end approach that simultaneously detects chimpanzee positions and estimates behavior categories from videos.
AlphaChimp achieves 10% higher tracking accuracy and a 20% improvement in behavior recognition compared to state-of-the-art methods.
Our approach bridges the gap between computer vision and primatology, enhancing technical capabilities and deepening our understanding of primate communication and sociality.
- Score: 29.14013458574676
- License:
- Abstract: Understanding non-human primate behavior is crucial for improving animal welfare, modeling social behavior, and gaining insights into both distinctly human and shared behaviors. Despite recent advances in computer vision, automated analysis of primate behavior remains challenging due to the complexity of their social interactions and the lack of specialized algorithms. Existing methods often struggle with the nuanced behaviors and frequent occlusions characteristic of primate social dynamics. This study aims to develop an effective method for automated detection, tracking, and recognition of chimpanzee behaviors in video footage. Here we show that our proposed method, AlphaChimp, an end-to-end approach that simultaneously detects chimpanzee positions and estimates behavior categories from videos, significantly outperforms existing methods in behavior recognition. AlphaChimp achieves approximately 10% higher tracking accuracy and a 20% improvement in behavior recognition compared to state-of-the-art methods, particularly excelling in the recognition of social behaviors. This superior performance stems from AlphaChimp's innovative architecture, which integrates temporal feature fusion with a Transformer-based self-attention mechanism, enabling more effective capture and interpretation of complex social interactions among chimpanzees. Our approach bridges the gap between computer vision and primatology, enhancing technical capabilities and deepening our understanding of primate communication and sociality. We release our code and models and hope this will facilitate future research in animal social dynamics. This work contributes to ethology, cognitive science, and artificial intelligence, offering new perspectives on social intelligence.
Related papers
- Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - Computer Vision for Primate Behavior Analysis in the Wild [61.08941894580172]
Video-based behavioral monitoring has great potential for transforming how we study animal cognition and behavior.
There is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today.
arXiv Detail & Related papers (2024-01-29T18:59:56Z) - Social Motion Prediction with Cognitive Hierarchies [19.71780279070757]
We introduce a new benchmark, a novel formulation, and a cognition-inspired framework.
We present Wusi, a 3D multi-person motion dataset under the context of team sports.
We develop a cognitive hierarchy framework to predict strategic human social interactions.
arXiv Detail & Related papers (2023-11-08T14:51:17Z) - ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors [32.72634137202146]
ChimpACT features videos of a group of over 20 chimpanzees residing at the Leipzig Zoo, Germany.
ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160,500 frames.
arXiv Detail & Related papers (2023-10-25T08:11:02Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - CNN-Based Action Recognition and Pose Estimation for Classifying Animal
Behavior from Videos: A Survey [0.0]
Action recognition, classifying activities performed by one or more subjects in a trimmed video, forms the basis of many techniques.
Deep learning models for human action recognition have progressed over the last decade.
Recent interest in research that incorporates deep learning-based action recognition for classification has increased.
arXiv Detail & Related papers (2023-01-15T20:54:44Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - The world seems different in a social context: a neural network analysis
of human experimental data [57.729312306803955]
We show that it is possible to replicate human behavioral data in both individual and social task settings by modifying the precision of prior and sensory signals.
An analysis of the neural activation traces of the trained networks provides evidence that information is coded in fundamentally different ways in the network in the individual and in the social conditions.
arXiv Detail & Related papers (2022-03-03T17:19:12Z) - Hierarchical principles of embodied reinforcement learning: A review [11.613306236691427]
We show that all important cognitive mechanisms have been implemented independently in isolated computational architectures.
We expect our results to guide the development of more sophisticated cognitively inspired hierarchical methods.
arXiv Detail & Related papers (2020-12-18T10:19:38Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.