Transformer-based Image and Video Inpainting: Current Challenges and Future Directions
- URL: http://arxiv.org/abs/2407.00226v1
- Date: Fri, 28 Jun 2024 20:42:36 GMT
- Title: Transformer-based Image and Video Inpainting: Current Challenges and Future Directions
- Authors: Omar Elharrouss, Rafat Damseh, Abdelkader Nasreddine Belkacem, Elarbi Badidi, Abderrahmane Lakas,
- Abstract summary: Inpainting is a viable solution for various applications, including photographic restoration, video editing, and medical imaging.
CNNs and generative adversarial networks (GANs) have significantly enhanced the inpainting task.
Visual transformers have been exploited and offer some improvements to image or video inpainting.
- Score: 5.2088618044533215
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image inpainting is currently a hot topic within the field of computer vision. It offers a viable solution for various applications, including photographic restoration, video editing, and medical imaging. Deep learning advancements, notably convolutional neural networks (CNNs) and generative adversarial networks (GANs), have significantly enhanced the inpainting task with an improved capability to fill missing or damaged regions in an image or video through the incorporation of contextually appropriate details. These advancements have improved other aspects, including efficiency, information preservation, and achieving both realistic textures and structures. Recently, visual transformers have been exploited and offer some improvements to image or video inpainting. The advent of transformer-based architectures, which were initially designed for natural language processing, has also been integrated into computer vision tasks. These methods utilize self-attention mechanisms that excel in capturing long-range dependencies within data; therefore, they are particularly effective for tasks requiring a comprehensive understanding of the global context of an image or video. In this paper, we provide a comprehensive review of the current image or video inpainting approaches, with a specific focus on transformer-based techniques, with the goal to highlight the significant improvements and provide a guideline for new researchers in the field of image or video inpainting using visual transformers. We categorized the transformer-based techniques by their architectural configurations, types of damage, and performance metrics. Furthermore, we present an organized synthesis of the current challenges, and suggest directions for future research in the field of image or video inpainting.
Related papers
- A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships [0.5639904484784127]
Transformer-based models have transformed the landscape of natural language processing (NLP)
These models are renowned for their ability to capture long-range dependencies and contextual information.
We discuss potential research directions and applications of transformer-based models in computer vision.
arXiv Detail & Related papers (2024-08-27T16:22:18Z) - Visual Prompt Tuning for Generative Transfer Learning [26.895321693202284]
We present a recipe for learning vision transformers by generative knowledge transfer.
We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.
To adapt to a new domain, we employ prompt tuning, which prepends learnable tokens called prompt to the image token sequence.
arXiv Detail & Related papers (2022-10-03T14:56:05Z) - A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective [71.03621840455754]
Graph Neural Networks (GNNs) have gained momentum in graph representation learning.
graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation.
This paper presents a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective.
arXiv Detail & Related papers (2022-09-27T08:10:14Z) - Vision Transformers: State of the Art and Research Challenges [26.462994554165697]
This paper presents a comprehensive overview of the literature on different architecture designs and training tricks for vision transformers.
Our goal is to provide a systematic review with the open research opportunities.
arXiv Detail & Related papers (2022-07-07T02:01:56Z) - Transformers in Medical Imaging: A Survey [88.03790310594533]
Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results.
Medical imaging has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields.
We provide a review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues.
arXiv Detail & Related papers (2022-01-24T18:50:18Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Deep Neural Network-based Enhancement for Image and Video Streaming
Systems: A Survey and Future Directions [20.835654670825782]
Deep learning has led to unprecedented performance in generating high-quality images from low-quality ones.
We present state-of-the-art content delivery systems that employ neural enhancement as a key component in achieving both fast response time and high visual quality.
arXiv Detail & Related papers (2021-06-07T15:42:36Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z) - State of the Art on Neural Rendering [141.22760314536438]
We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs.
This report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence.
arXiv Detail & Related papers (2020-04-08T04:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.