Related papers: I-ViSE: Interactive Video Surveillance as an Edge Service using Unsupervised Feature Queries

I-ViSE: Interactive Video Surveillance as an Edge Service using Unsupervised Feature Queries

URL: http://arxiv.org/abs/2003.04169v1
Date: Mon, 9 Mar 2020 14:26:45 GMT
Title: I-ViSE: Interactive Video Surveillance as an Edge Service using Unsupervised Feature Queries
Authors: Seyed Yahya Nikouei, Yu Chen, Alexander Aved, Erik Blasch
Abstract summary: This paper proposes an Interactive Video Surveillance as an Edge service (I-ViSE) based on unsupervised feature queries. An I-ViSE prototype is built following the edge-fog computing paradigm and the experimental results verified the I-ViSE scheme meets the design goal of scene recognition in less than two seconds.
Score: 70.69741666849046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Situation AWareness (SAW) is essential for many mission critical applications. However, SAW is very challenging when trying to immediately identify objects of interest or zoom in on suspicious activities from thousands of video frames. This work aims at developing a queryable system to instantly select interesting content. While face recognition technology is mature, in many scenarios like public safety monitoring, the features of objects of interest may be much more complicated than face features. In addition, human operators may not be always able to provide a descriptive, simple, and accurate query. Actually, it is more often that there are only rough, general descriptions of certain suspicious objects or accidents. This paper proposes an Interactive Video Surveillance as an Edge service (I-ViSE) based on unsupervised feature queries. Adopting unsupervised methods that do not reveal any private information, the I-ViSE scheme utilizes general features of a human body and color of clothes. An I-ViSE prototype is built following the edge-fog computing paradigm and the experimental results verified the I-ViSE scheme meets the design goal of scene recognition in less than two seconds.

Related papers

Interacted Object Grounding in Spatio-Temporal Human-Object Interactions [70.8859442754261]
We introduce a new open-world benchmark: Grounding Interacted Objects (GIO) An object grounding task is proposed expecting vision systems to discover interacted objects. We propose a 4D question-answering framework (4D-QA) to discover interacted objects from diverse videos.
arXiv Detail & Related papers (2024-12-27T09:08:46Z)
Privacy-Preserving Video Anomaly Detection: A Survey [10.899433437231139]
Video Anomaly Detection (VAD) aims to automatically analyze patterns in surveillance videos collected from open spaces to detect anomalous events that may cause harm without physical contact. The lack of transparency in video transmission and usage raises public concerns about privacy and ethics limiting the real-world application of VAD. Recently, researchers have focused on privacy concerns in VAD by conducting systematic studies from various perspectives including data, features, and systems. This article systematically reviews progress in P2VAD for the first time, defining its scope and providing an intuitive taxonomy.
arXiv Detail & Related papers (2024-11-21T20:29:59Z)
Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs [5.891295920078768]
We introduce an advanced approach for fine-grained object visual key field detection. First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images. Next, we use Vision Studio to extract semantic object descriptions. Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map.
arXiv Detail & Related papers (2024-04-01T14:53:36Z)
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models [63.946809247201905]
We introduce a new benchmark, namely SHIELD, to evaluate the ability of MLLMs on face spoofing and forgery detection. We design true/false and multiple-choice questions to evaluate multimodal face data in these two face security tasks. The results indicate that MLLMs hold substantial potential in the face security domain.
arXiv Detail & Related papers (2024-02-06T17:31:36Z)
MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation [6.032808648673282]
The learn-from-observation (LfO) paradigm is a human-inspired mode for a robot to learn to perform a task simply by watching it being performed. We present multi-view SA-Net, which generalizes the SA-Net model to allow the perception of multiple viewpoints of the task activity.
arXiv Detail & Related papers (2023-11-14T18:53:28Z)
Understanding Policy and Technical Aspects of AI-Enabled Smart Video Surveillance to Address Public Safety [2.2427353485837545]
This paper identifies the privacy concerns and requirements needed to address when designing AI-enabled smart video surveillance. We propose the first end-to-end AI-enabled privacy-preserving smart video surveillance system that holistically combines computer vision analytics, statistical data analytics, cloud-native services, and end-user applications.
arXiv Detail & Related papers (2023-02-08T19:54:35Z)
Multi-modal Transformers Excel at Class-agnostic Object Detection [105.10403103027306]
We argue that existing methods lack a top-down supervision signal governed by human-understandable semantics. We develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention. We show the significance of MViT proposals in a diverse range of applications.
arXiv Detail & Related papers (2021-11-22T18:59:29Z)
One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image. We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z)
ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos [79.05486554647918]
We propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD) We collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy.
arXiv Detail & Related papers (2021-07-24T15:14:20Z)
Exploit Clues from Views: Self-Supervised and Regularized Learning for Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL) A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation. Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.