Extraction Of Cumulative Blobs From Dynamic Gestures
- URL: http://arxiv.org/abs/2501.04002v1
- Date: Tue, 07 Jan 2025 18:59:28 GMT
- Title: Extraction Of Cumulative Blobs From Dynamic Gestures
- Authors: Rishabh Naulakha, Shubham Gaur, Dhairya Lodha, Mehek Tulsyan, Utsav Kotecha,
- Abstract summary: Gesture recognition is based on CV technology that allows the computer to interpret human motions as commands.
A simple night vision camera can be used as our camera for motion capture.
The video stream from the camera is fed into a Raspberry Pi which has a Python program running OpenCV module.
- Score: 0.0
- License:
- Abstract: Gesture recognition is a perceptual user interface, which is based on CV technology that allows the computer to interpret human motions as commands, allowing users to communicate with a computer without the use of hands, thus making the mouse and keyboard superfluous. Gesture recognition's main weakness is a light condition because gesture control is based on computer vision, which heavily relies on cameras. These cameras are used to interpret gestures in 2D and 3D, so the extracted information can vary depending on the source of light. The limitation of the system cannot work in a dark environment. A simple night vision camera can be used as our camera for motion capture as they also blast out infrared light which is not visible to humans but can be clearly seen with a camera that has no infrared filter this majorly overcomes the limitation of systems which cannot work in a dark environment. So, the video stream from the camera is fed into a Raspberry Pi which has a Python program running OpenCV module which is used for detecting, isolating and tracking the path of dynamic gesture, then we use an algorithm of machine learning to recognize the pattern drawn and accordingly control the GPIOs of the raspberry pi to perform some activities.
Related papers
- Turn-by-Turn Indoor Navigation for the Visually Impaired [0.0]
Navigating indoor environments presents significant challenges for visually impaired individuals.
This paper introduces a novel system that provides turn-by-turn navigation inside buildings using only a smartphone equipped with a camera.
Preliminary evaluations demonstrate the system's effectiveness in accurately guiding users through complex indoor spaces.
arXiv Detail & Related papers (2024-10-25T20:16:38Z) - Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models [63.89598561397856]
We present a system for quadrupedal mobile manipulation in indoor environments.
It uses a front-mounted gripper for object manipulation, a low-level controller trained in simulation using egocentric depth for agile skills.
We evaluate our system in two unseen environments without any real-world data collection or training.
arXiv Detail & Related papers (2024-09-30T20:58:38Z) - ChatCam: Empowering Camera Control through Conversational AI [67.31920821192323]
ChatCam is a system that navigates camera movements through conversations with users.
To achieve this, we propose CineGPT, a GPT-based autoregressive model for text-conditioned camera trajectory generation.
We also develop an Anchor Determinator to ensure precise camera trajectory placement.
arXiv Detail & Related papers (2024-09-25T20:13:41Z) - PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot [3.387892563308912]
We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network.
We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.
arXiv Detail & Related papers (2024-04-07T17:31:53Z) - Pedestrian detection with high-resolution event camera [0.0]
Event cameras (DVS) are a potentially interesting technology to address the above mentioned problems.
In this paper, we compare two methods of processing event data by means of deep learning for the task of pedestrian detection.
We used a representation in the form of video frames, convolutional neural networks and asynchronous sparse convolutional neural networks.
arXiv Detail & Related papers (2023-05-29T10:57:59Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - A real-time algorithm for human action recognition in RGB and thermal
video [1.5749416770494706]
We present a deep learning based algorithm for human action recognition for both RGB and thermal cameras.
It is able to detect and track humans and recognize four basic actions in real-time on a notebook with a NVIDIA GPU.
arXiv Detail & Related papers (2023-04-04T06:44:13Z) - PRNU Based Source Camera Identification for Webcam and Smartphone Videos [137.6408511310322]
This communication is about an application of image forensics where we use camera sensor fingerprints to identify source camera (SCI: Source Camera Identification) in webcam/smartphone videos.
arXiv Detail & Related papers (2022-01-27T18:57:14Z) - AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape
Estimation [51.17610485589701]
We present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments.
AirPose estimates human pose and shape using images captured by multiple uncalibrated flying cameras.
AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration.
arXiv Detail & Related papers (2022-01-20T09:46:20Z) - MotionInput v2.0 supporting DirectX: A modular library of open-source
gesture-based machine learning and computer vision methods for interacting
and controlling existing software with a webcam [11.120698968989108]
MotionInput v2.0 maps human motion gestures to input operations for existing applications and games.
Three use case areas assisted the development of the modules: creativity software, office and clinical software, and gaming software.
arXiv Detail & Related papers (2021-08-10T08:23:21Z) - Learning Neural Representation of Camera Pose with Matrix Representation
of Pose Shift via View Synthesis [105.37072293076767]
How to effectively represent camera pose is an essential problem in 3D computer vision.
We propose an approach to learn neural representations of camera poses and 3D scenes.
We conduct extensive experiments on synthetic and real datasets.
arXiv Detail & Related papers (2021-04-04T00:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.