Open-Source Tools for Behavioral Video Analysis: Setup, Methods, and
Development
- URL: http://arxiv.org/abs/2204.02842v1
- Date: Wed, 6 Apr 2022 14:06:43 GMT
- Title: Open-Source Tools for Behavioral Video Analysis: Setup, Methods, and
Development
- Authors: Kevin Luxem, Jennifer J. Sun, Sean P. Bradley, Keerthi Krishnan, Talmo
D. Pereira, Eric A. Yttri, Jan Zimmermann, and Mark Laubach
- Abstract summary: Methods for video analysis are transforming behavioral quantification to be more precise, scalable, and reproducible.
Open-source tools for video analysis have led to new experimental approaches to understand behavior.
We review currently available open source tools for video analysis, how to set them up in a lab that is new to video recording methods, and some issues that should be addressed.
- Score: 2.248500763940652
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently developed methods for video analysis, especially models for pose
estimation and behavior classification, are transforming behavioral
quantification to be more precise, scalable, and reproducible in fields such as
neuroscience and ethology. These tools overcome long-standing limitations of
manual scoring of video frames and traditional "center of mass" tracking
algorithms to enable video analysis at scale. The expansion of open-source
tools for video acquisition and analysis has led to new experimental approaches
to understand behavior. Here, we review currently available open source tools
for video analysis, how to set them up in a lab that is new to video recording
methods, and some issues that should be addressed by developers and advanced
users, including the need to openly share datasets and code, how to compare
algorithms and their parameters, and the need for documentation and
community-wide standards. We hope to encourage more widespread use and
continued development of the tools. They have tremendous potential for
accelerating scientific progress for understanding the brain and behavior.
Related papers
- Bitbox: Behavioral Imaging Toolbox for Computational Analysis of Behavior from Videos [3.215663456741252]
Computational measurement of human behavior from video has recently become feasible due to major advances in AI.<n> Bitbox is an open-source toolkit designed to make advanced computational analysis directly usable by behavioral scientists and clinical researchers.<n>It provides a standardized interface for extracting high-level behavioral measurements from video, leveraging multiple face, head, and body processors.
arXiv Detail & Related papers (2025-12-19T14:53:42Z) - VideoScoop: A Non-Traditional Domain-Independent Framework For Video Analysis [0.0]
Video Situation Analysis (VSA) is done manually with a human in the loop, which is error-prone and labor-intensive.<n>This report proposes a general-purpose VSA framework that overcomes the above limitations.<n>Video contents are extracted once using state-of-the-art Video Content Extraction technologies.
arXiv Detail & Related papers (2025-12-01T15:09:46Z) - Computational frame analysis revisited: On LLMs for studying news coverage [1.4528491369411618]
Generative LLMs like GPT and Claude are increasingly being used as content analytical tools.<n>We systematically evaluate them against their computational predecessors.<n>We conclude by endorsing a methodologically pluralistic approach and put forth a roadmap for computational frame analysis for researchers going forward.
arXiv Detail & Related papers (2025-11-21T19:52:46Z) - From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding [1.1645023309093054]
We present a framework that enables efficiently prototyping pipelines for multi-modal content analysis.<n>We craft a candidate recipe for a pipeline, marrying a set of pre-trained models, to convert videos into a temporal semi-structured data format.<n>We translate this structure further to a frame-level indexed knowledge graph representation that is query-able and supports continual learning.
arXiv Detail & Related papers (2025-10-01T23:20:15Z) - Improving Video Diffusion Transformer Training by Multi-Feature Fusion and Alignment from Self-Supervised Vision Encoders [59.98236644320787]
We show that training video diffusion models can benefit from aligning the intermediate features of the video generator with feature representations of pre-trained vision encoders.<n>We present Align4Gen which provides a novel multi-feature fusion and alignment method integrated into video diffusion model training.
arXiv Detail & Related papers (2025-09-11T15:39:27Z) - Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding [63.82450803014141]
Long-form video understanding presents significant challenges due to extensive temporal-spatial complexity.<n>We propose the Deep Video Discovery agent to leverage an agentic search strategy over segmented video clips.<n>Our DVD agent achieves SOTA performance, significantly surpassing prior works by a large margin on the challenging LVBench dataset.
arXiv Detail & Related papers (2025-05-23T16:37:36Z) - PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding [126.15907330726067]
We build a Perception Model Language (PLM) in a fully open and reproducible framework for transparent research in image and video understanding.
We analyze standard training pipelines without distillation from models and explore large-scale synthetic data to identify critical data gaps.
arXiv Detail & Related papers (2025-04-17T17:59:56Z) - Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding [1.024113475677323]
The lack of datasets hinders the development of accurate and comprehensive workflow analysis solutions.
We introduce a novel approach for addressing the sparsity and heterogeneity of data inspired by the human learning procedure of watching experts and understanding their explanations.
We present the first comprehensive solution for dense video captioning (DVC) of surgical videos, addressing this task despite the absence of existing datasets in the surgical domain.
arXiv Detail & Related papers (2025-03-14T13:36:13Z) - Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence.
Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z) - psifx -- Psychological and Social Interactions Feature Extraction Package [3.560429497877327]
psifx is a plug-and-play multi-modal feature extraction toolkit.
It aims to facilitate and democratize the use of state-of-the-art machine learning techniques for human sciences research.
arXiv Detail & Related papers (2024-07-14T16:20:42Z) - Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs [20.168429351519055]
Video understanding is a crucial next step for multimodal large language models (LMLMs)
We propose VideoNIAH (Video Needle In A Haystack), a benchmark construction framework through synthetic video generation.
We conduct a comprehensive evaluation of both proprietary and open-source models, uncovering significant differences in their video understanding capabilities.
arXiv Detail & Related papers (2024-06-13T17:50:05Z) - A Review of Machine Learning Methods Applied to Video Analysis Systems [3.518774226658318]
The paper provides a survey of the development of machine-learning techniques for video analysis.
We provide summaries of the development of self-supervised learning, semi-supervised learning, active learning, and zero-shot learning for applications in video analysis.
arXiv Detail & Related papers (2023-12-08T20:24:03Z) - Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
Video-based Large Language Models [81.84810348214113]
Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.
To guide the development of such a model, the establishment of a robust and comprehensive evaluation system becomes crucial.
This paper proposes textitVideo-Bench, a new comprehensive benchmark along with a toolkit specifically designed for evaluating Video-LLMs.
arXiv Detail & Related papers (2023-11-27T18:59:58Z) - What and How of Machine Learning Transparency: Building Bespoke
Explainability Tools with Interoperable Algorithmic Components [77.87794937143511]
This paper introduces a collection of hands-on training materials for explaining data-driven predictive models.
These resources cover the three core building blocks of this technique: interpretable representation composition, data sampling and explanation generation.
arXiv Detail & Related papers (2022-09-08T13:33:25Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - PosePipe: Open-Source Human Pose Estimation Pipeline for Clinical
Research [0.0]
We develop a human pose estimation pipeline that facilitates running state-of-the-art algorithms on data acquired in clinical context.
Our goal in this work is not to train new algorithms, but to advance the use of cutting-edge human pose estimation algorithms for clinical and translation research.
arXiv Detail & Related papers (2022-03-16T17:54:37Z) - Ada-VSR: Adaptive Video Super-Resolution with Meta-Learning [56.676110454594344]
VideoSuperResolution (Ada-SR) uses external as well as internal, information through meta-transfer learning and internal learning, respectively.
Model trained using our approach can quickly adapt to a specific video condition with only a few gradient updates, which reduces the inference time significantly.
arXiv Detail & Related papers (2021-08-05T19:59:26Z) - DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature [0.7349727826230862]
We open source DRIFT, which allows researchers to track research trends and development over the years.
The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure.
To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods.
arXiv Detail & Related papers (2021-07-02T17:33:25Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z) - Comprehensive Instructional Video Analysis: The COIN Dataset and
Performance Evaluation [100.68317848808327]
We present a large-scale dataset named as "COIN" for COmprehensive INstructional video analysis.
COIN dataset contains 11,827 videos of 180 tasks in 12 domains related to our daily life.
With a new developed toolbox, all the videos are annotated efficiently with a series of step labels and the corresponding temporal boundaries.
arXiv Detail & Related papers (2020-03-20T16:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.