Visually-aware Acoustic Event Detection using Heterogeneous Graphs
- URL: http://arxiv.org/abs/2207.07935v1
- Date: Sat, 16 Jul 2022 13:09:25 GMT
- Title: Visually-aware Acoustic Event Detection using Heterogeneous Graphs
- Authors: Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha
- Abstract summary: Perception of auditory events is inherently multimodal relying on both audio and visual cues.
We employ heterogeneous graphs to capture the spatial and temporal relationships between the modalities.
We show efficiently modelling of intra- and inter-modality relationships both at spatial and temporal scales.
- Score: 39.90352230010103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perception of auditory events is inherently multimodal relying on both audio
and visual cues. A large number of existing multimodal approaches process each
modality using modality-specific models and then fuse the embeddings to encode
the joint information. In contrast, we employ heterogeneous graphs to
explicitly capture the spatial and temporal relationships between the
modalities and represent detailed information about the underlying signal.
Using heterogeneous graph approaches to address the task of visually-aware
acoustic event classification, which serves as a compact, efficient and
scalable way to represent data in the form of graphs. Through heterogeneous
graphs, we show efficiently modelling of intra- and inter-modality
relationships both at spatial and temporal scales. Our model can easily be
adapted to different scales of events through relevant hyperparameters.
Experiments on AudioSet, a large benchmark, shows that our model achieves
state-of-the-art performance.
Related papers
- TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - Unified and Dynamic Graph for Temporal Character Grouping in Long Videos [31.192044026127032]
Video temporal character grouping locates appearing moments of major characters within a video according to their identities.
Recent works have evolved from unsupervised clustering to graph-based supervised clustering.
We present a unified and dynamic graph (UniDG) framework for temporal character grouping.
arXiv Detail & Related papers (2023-08-27T13:22:55Z) - Heterogeneous Graph Learning for Acoustic Event Classification [22.526665796655205]
Graphs for audiovisual data are constructed manually which is difficult and sub-optimal.
We develop a new model, heterogeneous graph crossmodal network (HGCN) that learns the crossmodal edges.
Our proposed model can adapt to various spatial and temporal scales owing to its parametric construction, while the learnable crossmodal edges effectively connect the relevant nodes.
arXiv Detail & Related papers (2023-03-05T13:06:53Z) - DyTed: Disentangled Representation Learning for Discrete-time Dynamic
Graph [59.583555454424]
We propose a novel disenTangled representation learning framework for discrete-time Dynamic graphs, namely DyTed.
We specially design a temporal-clips contrastive learning task together with a structure contrastive learning to effectively identify the time-invariant and time-varying representations respectively.
arXiv Detail & Related papers (2022-10-19T14:34:12Z) - Representing Videos as Discriminative Sub-graphs for Action Recognition [165.54738402505194]
We introduce a new design of sub-graphs to represent and encode theriminative patterns of each action in the videos.
We present MUlti-scale Sub-Earn Ling (MUSLE) framework that novelly builds space-time graphs and clusters into compact sub-graphs on each scale.
arXiv Detail & Related papers (2022-01-11T16:15:25Z) - Learning Spatial-Temporal Graphs for Active Speaker Detection [26.45877018368872]
SPELL is a framework that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data.
We first construct a graph from a video so that each node corresponds to one person.
We demonstrate that learning graph-based representation, owing to its explicit spatial and temporal structure, significantly improves the overall performance.
arXiv Detail & Related papers (2021-12-02T18:29:07Z) - Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks.
We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers.
We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z) - Graph Pattern Loss based Diversified Attention Network for Cross-Modal
Retrieval [10.420129873840578]
Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio.
One core of unsupervised approaches is to dig the correlations among different object representations to complete satisfied retrieval performance without requiring expensive labels.
We propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval.
arXiv Detail & Related papers (2021-06-25T10:53:07Z) - Hawkes Processes on Graphons [85.6759041284472]
We study Hawkes processes and their variants that are associated with Granger causality graphs.
We can generate the corresponding Hawkes processes and simulate event sequences.
We learn the proposed model by minimizing the hierarchical optimal transport distance between the generated event sequences and the observed ones.
arXiv Detail & Related papers (2021-02-04T17:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.