Learning Continuous Image Representation with Local Implicit Image
Function
- URL: http://arxiv.org/abs/2012.09161v2
- Date: Thu, 1 Apr 2021 13:33:26 GMT
- Title: Learning Continuous Image Representation with Local Implicit Image
Function
- Authors: Yinbo Chen, Sifei Liu, Xiaolong Wang
- Abstract summary: We propose LIIF representation, which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output.
To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with super-resolution.
The learned continuous representation can be presented in arbitrary resolution even extrapolate to x30 higher resolution.
- Score: 21.27344998709831
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to represent an image? While the visual world is presented in a
continuous manner, machines store and see the images in a discrete way with 2D
arrays of pixels. In this paper, we seek to learn a continuous representation
for images. Inspired by the recent progress in 3D reconstruction with implicit
neural representation, we propose Local Implicit Image Function (LIIF), which
takes an image coordinate and the 2D deep features around the coordinate as
inputs, predicts the RGB value at a given coordinate as an output. Since the
coordinates are continuous, LIIF can be presented in arbitrary resolution. To
generate the continuous representation for images, we train an encoder with
LIIF representation via a self-supervised task with super-resolution. The
learned continuous representation can be presented in arbitrary resolution even
extrapolate to x30 higher resolution, where the training tasks are not
provided. We further show that LIIF representation builds a bridge between
discrete and continuous representation in 2D, it naturally supports the
learning tasks with size-varied image ground-truths and significantly
outperforms the method with resizing the ground-truths.
Related papers
- CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - SAIR: Learning Semantic-aware Implicit Representation [23.842761556556216]
Implicit representation of an image can map arbitrary coordinates in the continuous domain to their corresponding color values.
Existing implicit representation approaches only focus on building continuous appearance mapping.
We learn semantic-aware implicit representation (SAIR), that is, we make the implicit representation of each pixel rely on both its appearance and semantic information.
arXiv Detail & Related papers (2023-10-13T17:52:16Z) - Dynamic Implicit Image Function for Efficient Arbitrary-Scale Image
Representation [24.429100808481394]
We propose Dynamic Implicit Image Function (DIIF), which is a fast and efficient method to represent images with arbitrary resolution.
We propose a coordinate grouping and slicing strategy, which enables the neural network to perform decoding from coordinate slices to pixel value slices.
With dynamic coordinate slicing, DIIF significantly reduces the computational cost when encountering arbitrary-scale SR.
arXiv Detail & Related papers (2023-06-21T15:04:34Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Single Image Super-Resolution via a Dual Interactive Implicit Neural
Network [5.331665215168209]
We introduce a novel implicit neural network for the task of single image super-resolution at arbitrary scale factors.
We demonstrate the efficacy and flexibility of our approach against the state of the art on publicly available benchmark datasets.
arXiv Detail & Related papers (2022-10-23T02:05:19Z) - Adaptive Local Implicit Image Function for Arbitrary-scale
Super-resolution [61.95533972380704]
Local implicit image function (LIIF) denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs.
LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors.
We propose a novel adaptive local image function (A-LIIF) to alleviate this problem.
arXiv Detail & Related papers (2022-08-07T11:23:23Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations [92.89846887298852]
We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
arXiv Detail & Related papers (2020-03-22T22:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.