Instrumental genesis through interdisciplinary collaboration --
reflections on the emergence of a visualisation framework for video
annotation data
- URL: http://arxiv.org/abs/2305.18825v1
- Date: Tue, 30 May 2023 08:21:46 GMT
- Title: Instrumental genesis through interdisciplinary collaboration --
reflections on the emergence of a visualisation framework for video
annotation data
- Authors: Olivier Aubert (LS2N, Nantes Univ, LS2N - \'equipe DUKe), Thomas
Scherer, Jasper Stratil
- Abstract summary: This paper presents, discusses and reflects on the development of a visualization framework for the analysis of the temporal dynamics of audiovisual expressivity.
It is described through the collaboration and communication processes between computer science scholars and humanities scholars.
The main focus of this paper is the process of iterative development of visualizations as interactive interfaces generated with the open-source software Advene.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instrumental genesis through interdisciplinary collaboration-reflections on
the emergence of a visualisation framework for video annotation data XML This
paper presents, discusses and reflects on the development of a visualization
framework for the analysis of the temporal dynamics of audiovisual
expressivity. The main focus lies on the instrumental genesis process (Rabardel
1995; Longchamp 2012)-a concept trying to express and analyze the co-evolution
of instruments and the practices they make possible-underlying this
development. It is described through the collaboration and communication
processes between computer science scholars and humanities scholars in finding
new ways of visualizing complex datasets for exploration and presentation in
the realm of film-studies research. It draws on the outcome and concrete usage
of the visualizations in publications and presentations of a research group,
the AdAproject, that investigates the audiovisual rhetorics of affect in
audiovisual media on the financial crisis (2007-). These film analyses are
based on theoretical assumptions on the process of film-viewing, the relation
of the viewer's perception and the temporally unfolding audiovisual images, and
a methodical approach that draws on 'steps' in the research process such as
segmentation, description and qualification, called eMAEX (Kappelhoff et al.
2011-2016) to reconstruct these experiential figurations (Bakels et al. 2020a,
2020b). The main focus of this paper is the process of iterative development of
visualizations as interactive interfaces generated with the open-source
software Advene, that were an integral part of the research process. In this
regard, the timeline visualization is not only of interest for visual
argumentation in (digital) humanities publications, but also for the creation
of annotations as well as the exploration of this data. In the first part of
the paper we describe this interdisciplinary collaboration as instrumental
genesis on a general level-as an evolving and iterative process. In the second
part we focus on the specific challenge of designing a visualization framework
for the temporal dynamics of audiovisual aesthetics. Lastly we zoom out by
reflecting on experiences and insights that might be of interest for the wider
digital humanities community.
Related papers
- Video-to-Audio Generation with Hidden Alignment [27.11625918406991]
We offer insights into the video-to-audio generation paradigm, focusing on vision encoders, auxiliary embeddings, and data augmentation techniques.
We demonstrate our model exhibits state-of-the-art video-to-audio generation capabilities.
arXiv Detail & Related papers (2024-07-10T08:40:39Z) - Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense
Interactions through Masked Modeling [24.346868432774453]
Humans possess a remarkable ability to integrate auditory and visual information, enabling a deeper understanding of the surrounding environment.
This early fusion of audio and visual cues, demonstrated through cognitive psychology and neuroscience research, offers promising potential for developing multimodal perception models.
We address training early fusion architectures by leveraging the masked reconstruction framework, previously successful in unimodal settings, to train audio-visual encoders with early fusion.
We propose an attention-based fusion module that captures interactions between local audio and visual representations, enhancing the model's ability to capture fine-grained interactions.
arXiv Detail & Related papers (2023-12-02T03:38:49Z) - Learning in Audio-visual Context: A Review, Analysis, and New
Perspective [88.40519011197144]
This survey aims to systematically organize and analyze studies of the audio-visual field.
We introduce several key findings that have inspired our computational studies.
We propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area.
arXiv Detail & Related papers (2022-08-20T02:15:44Z) - Recent Advances and Challenges in Deep Audio-Visual Correlation Learning [7.273353828127817]
This paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video.
We also discuss some tasks of definition and paradigm applied in AI multimedia.
arXiv Detail & Related papers (2022-02-28T10:43:01Z) - From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence.
Research in image captioning has not reached a conclusive answer yet.
This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Survey on Visual Sentiment Analysis [87.20223213370004]
This paper reviews pertinent publications and tries to present an exhaustive overview of the field of Visual Sentiment Analysis.
The paper also describes principles of design of general Visual Sentiment Analysis systems from three main points of view.
A formalization of the problem is discussed, considering different levels of granularity, as well as the components that can affect the sentiment toward an image in different ways.
arXiv Detail & Related papers (2020-04-24T10:15:22Z) - Image Segmentation Using Deep Learning: A Survey [58.37211170954998]
Image segmentation is a key topic in image processing and computer vision.
There has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models.
arXiv Detail & Related papers (2020-01-15T21:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.