Lester: rotoscope animation through video object segmentation and
tracking
- URL: http://arxiv.org/abs/2402.09883v1
- Date: Thu, 15 Feb 2024 11:15:54 GMT
- Title: Lester: rotoscope animation through video object segmentation and
tracking
- Authors: Ruben Tous
- Abstract summary: Lester is a novel method to automatically synthetise retro-style 2D animations from videos.
Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT.
Results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article introduces Lester, a novel method to automatically synthetise
retro-style 2D animations from videos. The method approaches the challenge
mainly as an object segmentation and tracking problem. Video frames are
processed with the Segment Anything Model (SAM) and the resulting masks are
tracked through subsequent frames with DeAOT, a method of hierarchical
propagation for semi-supervised video object segmentation. The geometry of the
masks' contours is simplified with the Douglas-Peucker algorithm. Finally,
facial traits, pixelation and a basic shadow effect can be optionally added.
The results show that the method exhibits an excellent temporal consistency and
can correctly process videos with different poses and appearances, dynamic
shots, partial shots and diverse backgrounds. The proposed method provides a
more simple and deterministic approach than diffusion models based
video-to-video translation pipelines, which suffer from temporal consistency
problems and do not cope well with pixelated and schematic outputs. The method
is also much most practical than techniques based on 3D human pose estimation,
which require custom handcrafted 3D models and are very limited with respect to
the type of scenes they can process.
Related papers
- Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models [54.35214051961381]
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use in movies, games, AR, and VR.
However, creating temporal consistent and realistic textures for mesh remains labor-intensive for professional artists.
We present 3D Tex sequences that integrates inherent geometry from mesh sequences with video diffusion models to produce consistent textures.
arXiv Detail & Related papers (2024-10-14T17:59:59Z) - Controllable Longer Image Animation with Diffusion Models [12.565739255499594]
We introduce an open-domain controllable image animation method using motion priors with video diffusion models.
Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos.
We propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks.
arXiv Detail & Related papers (2024-05-27T16:08:00Z) - Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos [47.97168047776216]
We introduce a new method for learning a generative model of articulated 3D animal motions from raw, unlabeled online videos.
Our model learns purely from a collection of unlabeled web video clips, leveraging semantic correspondences distilled from self-supervised image features.
arXiv Detail & Related papers (2023-12-21T06:44:18Z) - MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion [57.90404618420159]
We introduce Multi-view Ancestral Sampling (MAS), a method for 3D motion generation.
MAS works by simultaneously denoising multiple 2D motion sequences representing different views of the same 3D motion.
We demonstrate MAS on 2D pose data acquired from videos depicting professional basketball maneuvers.
arXiv Detail & Related papers (2023-10-23T09:05:18Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - Blocks2World: Controlling Realistic Scenes with Editable Primitives [5.541644538483947]
We present Blocks2World, a novel method for 3D scene rendering and editing.
Our technique begins by extracting 3D parallelepipeds from various objects in a given scene using convex decomposition.
The next stage involves training a conditioned model that learns to generate images from the 2D-rendered convex primitives.
arXiv Detail & Related papers (2023-07-07T21:38:50Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Learning 3D Photography Videos via Self-supervised Diffusion on Single
Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects.
Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints.
We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.