Visual Semantic Multimedia Event Model for Complex Event Detection in
Video Streams
- URL: http://arxiv.org/abs/2009.14525v1
- Date: Wed, 30 Sep 2020 09:22:23 GMT
- Title: Visual Semantic Multimedia Event Model for Complex Event Detection in
Video Streams
- Authors: Piyush Yadav, Edward Curry
- Abstract summary: Middleware systems such as complex event processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion.
We present a visual event specification method to enable complex structured event processing by creating a structured knowledge representation from low-level media streams.
- Score: 5.53329677986653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimedia data is highly expressive and has traditionally been very
difficult for a machine to interpret. Middleware systems such as complex event
processing (CEP) mine patterns from data streams and send notifications to
users in a timely fashion. Presently, CEP systems have inherent limitations to
process multimedia streams due to its data complexity and the lack of an
underlying structured data model. In this work, we present a visual event
specification method to enable complex multimedia event processing by creating
a semantic knowledge representation derived from low-level media streams. The
method enables the detection of high-level semantic concepts from the media
streams using an ensemble of pattern detection capabilities. The semantic model
is aligned with a multimedia CEP engine deep learning models to give
flexibility to end-users to build rules using spatiotemporal event calculus.
This enhances CEP capability to detect patterns from media streams and bridge
the semantic gap between highly expressive knowledge-centric user queries to
the low-level features of the multi-media data. We have built a small traffic
event ontology prototype to validate the approach and performance. The paper
contribution is threefold: i) we present a knowledge graph representation for
multimedia streams, ii) a hierarchical event network to detect visual patterns
from media streams and iii) define complex pattern rules for complex multimedia
event reasoning using event calculus
Related papers
- A New Hybrid Intelligent Approach for Multimodal Detection of Suspected Disinformation on TikTok [0.0]
This study introduces a hybrid framework that combines the computational power of deep learning with the interpretability of fuzzy logic to detect suspected disinformation in TikTok videos.
The methodology is comprised of two core components: a multimodal feature analyser that extracts and evaluates data from text, audio, and video; and a multimodal disinformation detector based on fuzzy logic.
arXiv Detail & Related papers (2025-02-09T12:37:48Z) - Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning [56.873534081386]
A new topic HIREST is presented, including video retrieval, moment retrieval, moment segmentation, and step-captioning.
We propose a query-centric audio-visual cognition network to construct a reliable multi-modal representation for three tasks.
This can cognize user-preferred content and thus attain a query-centric audio-visual representation for three tasks.
arXiv Detail & Related papers (2024-12-18T06:43:06Z) - Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency.
Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z) - MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling [4.160176518973659]
We introduce a unified template filling model that connects the textual and visual modalities via textual prompts.
Our system surpasses the current SOTA on textual EAE by +7% F1, and performs generally better than the second-best systems for multimedia EAE.
arXiv Detail & Related papers (2024-06-18T09:14:17Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Support-set based Multi-modal Representation Enhancement for Video
Captioning [121.70886789958799]
We propose a Support-set based Multi-modal Representation Enhancement (SMRE) model to mine rich information in a semantic subspace shared between samples.
Specifically, we propose a Support-set Construction (SC) module to construct a support-set to learn underlying connections between samples and obtain semantic-related visual elements.
During this process, we design a Semantic Space Transformation (SST) module to constrain relative distance and administrate multi-modal interactions in a self-supervised way.
arXiv Detail & Related papers (2022-05-19T03:40:29Z) - Reliable Shot Identification for Complex Event Detection via
Visual-Semantic Embedding [72.9370352430965]
We propose a visual-semantic guided loss method for event detection in videos.
Motivated by curriculum learning, we introduce a negative elastic regularization term to start training the classifier with instances of high reliability.
An alternative optimization algorithm is developed to solve the proposed challenging non-net regularization problem.
arXiv Detail & Related papers (2021-10-12T11:46:56Z) - METEOR: Learning Memory and Time Efficient Representations from
Multi-modal Data Streams [19.22829945777267]
We present METEOR, a novel MEmory and Time Efficient Online Representation learning technique.
We show that METEOR preserves the quality of the representations while reducing memory usage by around 80% compared to the conventional memory-intensive embeddings.
arXiv Detail & Related papers (2020-07-23T08:18:02Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.