Szloca: towards a framework for full 3D tracking through a single camera
in context of interactive arts
- URL: http://arxiv.org/abs/2206.12958v1
- Date: Sun, 26 Jun 2022 20:09:47 GMT
- Title: Szloca: towards a framework for full 3D tracking through a single camera
in context of interactive arts
- Authors: Sahaj Garg
- Abstract summary: This research presents a novel way and a framework towards obtaining data and virtual representation of objects/people.
The model does not rely on complex training of computer vision systems but combines prior computer vision research and adds a capacity to represent z depth.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Realtime virtual data of objects and human presence in a large area holds a
valuable key in enabling many experiences and applications in various
industries and with exponential rise in the technological development of
artificial intelligence, computer vision has expanded the possibilities of
tracking and classifying things through just video inputs, which is also
surpassing the limitations of most popular and common hardware setups known
traditionally to detect human pose and position, such as low field of view and
limited tracking capacity. The benefits of using computer vision in application
development is large as it augments traditional input sources (like video
streams) and can be integrated in many environments and platforms. In the
context of new media interactive arts, based on physical movements and
expanding over large areas or gallaries, this research presents a novel way and
a framework towards obtaining data and virtual representation of objects/people
- such as three-dimensional positions, skeltons/pose and masks from a single
rgb camera. Looking at the state of art through some recent developments and
building on prior research in the field of computer vision, the paper also
proposes an original method to obtain three dimensional position data from
monocular images, the model does not rely on complex training of computer
vision systems but combines prior computer vision research and adds a capacity
to represent z depth, ieto represent a world position in 3 axis from a 2d input
source.
Related papers
- Diffusion Models in 3D Vision: A Survey [11.116658321394755]
We review the state-of-the-art approaches that leverage diffusion models for 3D visual tasks.
These approaches include 3D object generation, shape completion, point cloud reconstruction, and scene understanding.
We discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining.
arXiv Detail & Related papers (2024-10-07T04:12:23Z) - Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image [70.02187124865627]
Open-vocabulary 3D object detection (OV-3DDet) aims to localize and recognize both seen and previously unseen object categories within any new 3D scene.
We leverage a vision foundation model to provide image-wise guidance for discovering novel classes in 3D scenes.
We demonstrate significant improvements in accuracy and generalization, highlighting the potential of foundation models in advancing open-vocabulary 3D object detection.
arXiv Detail & Related papers (2024-07-07T04:50:04Z) - Deep Models for Multi-View 3D Object Recognition: A Review [16.500711021549947]
Multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance.
This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks.
arXiv Detail & Related papers (2024-04-23T16:54:31Z) - Recent Trends in 3D Reconstruction of General Non-Rigid Scenes [104.07781871008186]
Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision.
It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications.
This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs.
arXiv Detail & Related papers (2024-03-22T09:46:11Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Multiview Compressive Coding for 3D Reconstruction [77.95706553743626]
We introduce a simple framework that operates on 3D points of single objects or whole scenes.
Our model, Multiview Compressive Coding, learns to compress the input appearance and geometry to predict the 3D structure.
arXiv Detail & Related papers (2023-01-19T18:59:52Z) - State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics.
This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.
arXiv Detail & Related papers (2022-10-27T17:59:53Z) - 3D shape sensing and deep learning-based segmentation of strawberries [5.634825161148484]
We evaluate modern sensing technologies including stereo and time-of-flight cameras for 3D perception of shape in agriculture.
We propose a novel 3D deep neural network which exploits the organised nature of information originating from the camera-based 3D sensors.
arXiv Detail & Related papers (2021-11-26T18:43:10Z) - KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding
in 2D and 3D [67.50776195828242]
KITTI-360 is a suburban driving dataset which comprises richer input modalities, comprehensive semantic instance annotations and accurate localization.
For efficient annotation, we created a tool to label 3D scenes with bounding primitives, resulting in over 150k semantic and instance annotated images and 1B annotated 3D points.
We established benchmarks and baselines for several tasks relevant to mobile perception, encompassing problems from computer vision, graphics, and robotics on the same dataset.
arXiv Detail & Related papers (2021-09-28T00:41:29Z) - SAILenv: Learning in Virtual Visual Environments Made Simple [16.979621213790015]
We present a novel platform that allows researchers to experiment visual recognition in virtual 3D scenes.
A few lines of code are needed to interface every algorithm with the virtual world, and non-3D-graphics experts can easily customize the 3D environment itself.
Our framework yields pixel-level semantic and instance labeling, depth, and, to the best of our knowledge, it is the only one that provides motion-related information directly inherited from the 3D engine.
arXiv Detail & Related papers (2020-07-16T09:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.