Related papers: nvTorchCam: An Open-source Library for Camera-Agnostic Differentiable Geometric Vision

nvTorchCam: An Open-source Library for Camera-Agnostic Differentiable Geometric Vision

URL: http://arxiv.org/abs/2410.12074v1
Date: Tue, 15 Oct 2024 21:24:31 GMT
Title: nvTorchCam: An Open-source Library for Camera-Agnostic Differentiable Geometric Vision
Authors: Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, Orazio Gallo,
Abstract summary: We introduce nvTorchCam, an open-source library under the Apache 2.0 license designed to make deep learning algorithms camera model-independent. nvTorchCam abstracts critical camera operations such as projection and unprojection, allowing developers to implement algorithms once and apply them across diverse camera models.
Score: 54.047353679741086
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce nvTorchCam, an open-source library under the Apache 2.0 license, designed to make deep learning algorithms camera model-independent. nvTorchCam abstracts critical camera operations such as projection and unprojection, allowing developers to implement algorithms once and apply them across diverse camera models--including pinhole, fisheye, and 360 equirectangular panoramas, which are commonly used in automotive and real estate capture applications. Built on PyTorch, nvTorchCam is fully differentiable and supports GPU acceleration and batching for efficient computation. Furthermore, deep learning models trained for one camera type can be directly transferred to other camera types without requiring additional modification. In this paper, we provide an overview of nvTorchCam, its functionality, and present various code examples and diagrams to demonstrate its usage. Source code and installation instructions can be found on the nvTorchCam GitHub page at https://github.com/NVlabs/nvTorchCam.

Related papers

OmniCam: Unified Multimodal Video Generation via Camera Control [42.94206239207397]
Camera control which achieves diverse visual effects by changing camera position and pose has attracted widespread attention. Existing methods face challenges such as complex interaction and limited control capabilities. We present OmniCam, a unified camera framework that generates guidance-temporally consistent videos.
arXiv Detail & Related papers (2025-04-03T06:38:30Z)
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework. It reproduces the dynamic scene of an input video at novel camera trajectories. Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
Extraction Of Cumulative Blobs From Dynamic Gestures [0.0]
Gesture recognition is based on CV technology that allows the computer to interpret human motions as commands. A simple night vision camera can be used as our camera for motion capture. The video stream from the camera is fed into a Raspberry Pi which has a Python program running OpenCV module.
arXiv Detail & Related papers (2025-01-07T18:59:28Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
Training-free Camera Control for Video Generation [19.526135830699882]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z)
CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation since it allows users to create desired content. Existing models largely overlooked the precise control of camera pose that serves as a cinematic language. We introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)
PyTorchVideo: A Deep Learning Library for Video Understanding [71.89124881732015]
PyTorchVideo is an open-source deep-learning library for video understanding tasks. It covers a full stack of video understanding tools including multimodal data loading, transformations, and models. The library is based on PyTorch and can be used by any training framework.
arXiv Detail & Related papers (2021-11-18T18:59:58Z)
FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction [70.09086274139504]
Multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras. We introduce FLEX, an end-to-end parameter-free multi-view model. We demonstrate results on the Human3.6M and KTH Multi-view Football II datasets.
arXiv Detail & Related papers (2021-05-05T09:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.