Image Transformation Sequence Retrieval with General Reinforcement
Learning
- URL: http://arxiv.org/abs/2307.06630v1
- Date: Thu, 13 Jul 2023 08:56:20 GMT
- Title: Image Transformation Sequence Retrieval with General Reinforcement
Learning
- Authors: Enrique Mas-Candela, Antonio R\'ios-Vila, Jorge Calvo-Zaragoza
- Abstract summary: We present the novel Image Transformation Sequence Retrieval (ITSR) task, in which a model must retrieve the sequence of transformations between two images that act as source and target, respectively.
We propose a solution to ITSR using a general model-based Reinforcement Learning such as Monte Carlo Tree Search (MCTS), which is combined with a deep neural network.
Our experiments provide a benchmark in both synthetic and real domains, where the proposed approach is compared with supervised training.
- Score: 6.423239719448169
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, the novel Image Transformation Sequence Retrieval (ITSR) task
is presented, in which a model must retrieve the sequence of transformations
between two given images that act as source and target, respectively. Given
certain characteristics of the challenge such as the multiplicity of a correct
sequence or the correlation between consecutive steps of the process, we
propose a solution to ITSR using a general model-based Reinforcement Learning
such as Monte Carlo Tree Search (MCTS), which is combined with a deep neural
network. Our experiments provide a benchmark in both synthetic and real
domains, where the proposed approach is compared with supervised training. The
results report that a model trained with MCTS is able to outperform its
supervised counterpart in both the simplest and the most complex cases. Our
work draws interesting conclusions about the nature of ITSR and its associated
challenges.
Related papers
- Causal Image Modeling for Efficient Visual Understanding [41.87857129429512]
We introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations.
This modeling paradigm allows us to process images in a recurrent formulation with linear complexity relative to the sequence length.
In detail, we introduce two simple designs that seamlessly integrate image inputs into the causal inference framework.
arXiv Detail & Related papers (2024-10-10T04:14:52Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised
Real-world Single Image Super-Resolution [60.90817228730133]
Single image super-resolution (SISR) is a challenging problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart.
Recent approaches are trained on simulated LR images degraded by simplified down-sampling operators.
We propose a novel Invertible scale-Conditional Function (ICF) which can scale an input image and then restore the original input with different scale conditions.
arXiv Detail & Related papers (2023-07-24T12:42:45Z) - A Unifying Multi-sampling-ratio CS-MRI Framework With Two-grid-cycle
Correction and Geometric Prior Distillation [7.643154460109723]
We propose a unifying deep unfolding multi-sampling-ratio CS-MRI framework, by merging advantages of model-based and deep learning-based methods.
Inspired by multigrid algorithm, we first embed the CS-MRI-based optimization algorithm into correction-distillation scheme.
We employ a condition module to learn adaptively step-length and noise level from compressive sampling ratio in every stage.
arXiv Detail & Related papers (2022-05-14T13:36:27Z) - Universal Generative Modeling for Calibration-free Parallel Mr Imaging [13.875986147033002]
We present an unsupervised deep learning framework for calibration-free parallel MRI.
We make use of the merits of both wavelet transform and the adaptive iteration strategy in a unified framework.
We train a powerful noise conditional score network by forming wavelet tensor as the network input.
arXiv Detail & Related papers (2022-01-25T10:05:39Z) - Self-supervised Correlation Mining Network for Person Image Generation [9.505343361614928]
Person image generation aims to perform non-rigid deformation on source images.
We propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space.
For improving the fidelity of cross-scale pose transformation, we propose a graph based Body Structure Retaining Loss.
arXiv Detail & Related papers (2021-11-26T03:57:46Z) - One Network to Solve Them All: A Sequential Multi-Task Joint Learning
Network Framework for MR Imaging Pipeline [12.684219884940056]
A sequential multi-task joint learning network model is proposed to train a combined end-to-end pipeline.
The proposed framework is verified on MRB dataset, which achieves superior performance on other SOTA methods in terms of both reconstruction and segmentation.
arXiv Detail & Related papers (2021-05-14T05:55:27Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.