What You Can Learn by Staring at a Blank Wall
- URL: http://arxiv.org/abs/2108.13027v1
- Date: Mon, 30 Aug 2021 07:30:19 GMT
- Title: What You Can Learn by Staring at a Blank Wall
- Authors: Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba,
Gregory W. Wornell, William T. Freeman, Fredo Durand
- Abstract summary: We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room.
Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene.
- Score: 92.68037992130559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a passive non-line-of-sight method that infers the number of
people or activity of a person from the observation of a blank wall in an
unknown room. Our technique analyzes complex imperceptible changes in indirect
illumination in a video of the wall to reveal a signal that is correlated with
motion in the hidden part of a scene. We use this signal to classify between
zero, one, or two moving people, or the activity of a person in the hidden
scene. We train two convolutional neural networks using data collected from 20
different scenes, and achieve an accuracy of $\approx94\%$ for both tasks in
unseen test environments and real-time online settings. Unlike other passive
non-line-of-sight methods, the technique does not rely on known occluders or
controllable light sources, and generalizes to unknown rooms with no
re-calibration. We analyze the generalization and robustness of our method with
both real and synthetic data, and study the effect of the scene parameters on
the signal quality.
Related papers
- How Video Meetings Change Your Expression [29.898716559065672]
Given two unpaired sets of videos of people, we seek to automatically find-temporal patterns that are distinctive of each set.
We tackle the problem through the lens of generative domain translation.
We demonstrate that our method can discover behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs)
arXiv Detail & Related papers (2024-06-03T03:15:02Z) - Self-Supervised Feature Learning for Long-Term Metric Visual
Localization [16.987148593917905]
We present a novel self-supervised feature learning framework for metric visual localization.
We use a sequence-based image matching algorithm to generate image correspondences without ground-truth labels.
We can then sample image pairs to train a deep neural network that learns sparse features with associated descriptors and scores without ground-truth pose supervision.
arXiv Detail & Related papers (2022-11-30T21:15:05Z) - Online Deep Clustering with Video Track Consistency [85.8868194550978]
We propose an unsupervised clustering-based approach to learn visual features from video object tracks.
We show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.
arXiv Detail & Related papers (2022-06-07T08:11:00Z) - Visual-Tactile Multimodality for Following Deformable Linear Objects
Using Reinforcement Learning [15.758583731036007]
We study the problem of using vision and tactile inputs together to complete the task of following deformable linear objects.
We create a Reinforcement Learning agent using different sensing modalities and investigate how its behaviour can be boosted.
Our experiments show that the use of both vision and tactile inputs, together with proprioception, allows the agent to complete the task in up to 92% of cases.
arXiv Detail & Related papers (2022-03-31T21:59:08Z) - Robots Autonomously Detecting People: A Multimodal Deep Contrastive
Learning Method Robust to Intraclass Variations [6.798578739481274]
We present a novel multimodal person detection architecture to address the mobile robot problem of person detection under intraclass variations.
We present a two-stage training approach using 1) a unique pretraining method we define as Temporal Invariant Multimodal Contrastive Learning (TimCLR), and 2) a Multimodal Faster R-CNN (MFRCNN) detector.
arXiv Detail & Related papers (2022-03-01T02:36:17Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in
Motion with Neural Rendering [9.600908665766465]
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation.
We show that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes.
arXiv Detail & Related papers (2020-12-22T23:45:28Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.