Efficient Pipelines for Vision-Based Context Sensing
- URL: http://arxiv.org/abs/2011.00427v1
- Date: Sun, 1 Nov 2020 05:09:13 GMT
- Title: Efficient Pipelines for Vision-Based Context Sensing
- Authors: Xiaochen Liu
- Abstract summary: There is an emergence of vision sources deployed worldwide. The cameras could be installed on roadside, in-house, and on mobile platforms.
However, the vision data collection and analytics are still highly manual today.
There are three major challenges for today's vision-based context sensing systems.
- Score: 0.24366811507669117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context awareness is an essential part of mobile and ubiquitous computing.
Its goal is to unveil situational information about mobile users like locations
and activities. The sensed context can enable many services like navigation,
AR, and smarting shopping. Such context can be sensed in different ways
including visual sensors. There is an emergence of vision sources deployed
worldwide. The cameras could be installed on roadside, in-house, and on mobile
platforms. This trend provides huge amount of vision data that could be used
for context sensing. However, the vision data collection and analytics are
still highly manual today. It is hard to deploy cameras at large scale for data
collection. Organizing and labeling context from the data are also labor
intensive. In recent years, advanced vision algorithms and deep neural networks
are used to help analyze vision data. But this approach is limited by data
quality, labeling effort, and dependency on hardware resources. In summary,
there are three major challenges for today's vision-based context sensing
systems: data collection and labeling at large scale, process large data
volumes efficiently with limited hardware resources, and extract accurate
context out of vision data. The thesis explores the design space that consists
of three dimensions: sensing task, sensor types, and task locations. Our prior
work explores several points in this design space. We make contributions by (1)
developing efficient and scalable solutions for different points in the design
space of vision-based sensing tasks; (2) achieving state-of-the-art accuracy in
those applications; (3) and developing guidelines for designing such sensing
systems.
Related papers
- Towards Mobile Sensing with Event Cameras on High-agility Resource-constrained Devices: A Survey [21.038748549750395]
This paper surveys the literature over the period 2014-2024.
It provides a comprehensive overview of event-based mobile sensing systems.
We discuss key applications of event cameras in mobile sensing, including visual odometry, object tracking, optical flow estimation, and 3D reconstruction.
arXiv Detail & Related papers (2025-03-29T02:28:32Z) - InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios [13.821143687548494]
This paper introduces a new 3D infrastructure-side collaborative perception dataset, abbreviated as inscope.
InScope encapsulates a 20-day capture duration with 303 tracking trajectories and 187,787 3D bounding boxes annotated by experts.
arXiv Detail & Related papers (2024-07-31T13:11:14Z) - VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph [2.3143591448419074]
Vision Knowledge Graph (VisionKG) is a novel resource that interlinks, organizes and manages visual datasets via knowledge graphs and Semantic Web technologies.
VisionKG currently contains 519 million RDF triples that describe approximately 40 million entities.
arXiv Detail & Related papers (2023-09-24T11:19:13Z) - Vision-Based Environmental Perception for Autonomous Driving [4.138893879750758]
Visual perception plays an important role in autonomous driving.
Recent development of deep learning-based method has better reliability and processing speed.
Monocular camera uses image data from a single viewpoint to estimate object depth.
Simultaneous Location and Mapping (SLAM) can establish a model of the road environment.
arXiv Detail & Related papers (2022-12-22T01:59:58Z) - CXTrack: Improving 3D Point Cloud Tracking with Contextual Information [59.55870742072618]
3D single object tracking plays an essential role in many applications, such as autonomous driving.
We propose CXTrack, a novel transformer-based network for 3D object tracking.
We show that CXTrack achieves state-of-the-art tracking performance while running at 29 FPS.
arXiv Detail & Related papers (2022-11-12T11:29:01Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Deep Depth Completion: A Survey [26.09557446012222]
We provide a comprehensive literature review that helps readers better grasp the research trends and clearly understand the current advances.
We investigate the related studies from the design aspects of network architectures, loss functions, benchmark datasets, and learning strategies.
We present a quantitative comparison of model performance on two widely used benchmark datasets, including an indoor and an outdoor dataset.
arXiv Detail & Related papers (2022-05-11T08:24:00Z) - KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding
in 2D and 3D [67.50776195828242]
KITTI-360 is a suburban driving dataset which comprises richer input modalities, comprehensive semantic instance annotations and accurate localization.
For efficient annotation, we created a tool to label 3D scenes with bounding primitives, resulting in over 150k semantic and instance annotated images and 1B annotated 3D points.
We established benchmarks and baselines for several tasks relevant to mobile perception, encompassing problems from computer vision, graphics, and robotics on the same dataset.
arXiv Detail & Related papers (2021-09-28T00:41:29Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - Urban Sensing based on Mobile Phone Data: Approaches, Applications and
Challenges [67.71975391801257]
Much concern in mobile data analysis is related to human beings and their behaviours.
This work aims to review the methods and techniques that have been implemented to discover knowledge from mobile phone data.
arXiv Detail & Related papers (2020-08-29T15:14:03Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.