Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
- URL: http://arxiv.org/abs/2512.08873v1
- Date: Tue, 09 Dec 2025 18:05:59 GMT
- Title: Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
- Authors: Jing Jie Tan, Anissa Mokraoui, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum,
- Abstract summary: The proposed SOLI approach presents a solution specifically designed for lightweight, low-resolution images captioning.<n>It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process.<n>By focusing on a dual-pathway neural network structure, SOLI minimizes computational overhead without sacrificing performance.
- Score: 1.872675437352477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process. By focusing on a dual-pathway neural network structure, SOLI minimizes computational overhead without sacrificing performance, making it an ideal choice for training on resource-constrained scenarios.
Related papers
- A Tree-guided CNN for image super-resolution [50.30242741813306]
We design a tree-guided CNN for image super-resolution (TSRNet)<n>It uses a tree architecture to guide a deep network to enhance effect of key nodes to amplify the relation of hierarchical information.<n>To prevent insufficiency of the obtained structural information, cosine transform techniques in the TSRNet are used to improve performance of image super-resolution.
arXiv Detail & Related papers (2025-06-03T08:05:11Z) - Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement [50.93686436282772]
We aim to delve into the limits of image enhancers both from visual quality and computational efficiency.<n>By rethinking the task demands, we build an explicit connection, i.e., visual quality and computational efficiency are corresponding to model learning and structure design.<n>Ultimately, this achieves efficient low-light image enhancement using only a single convolutional layer, while maintaining excellent visual quality.
arXiv Detail & Related papers (2025-02-27T08:20:03Z) - Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations [6.113035634680655]
Current deep learning-based low-light image enhancement methods often struggle with high-resolution images.
We introduce a novel approach termed CoLIE, which redefines the enhancement process through mapping the 2D coordinates of an underexposed image to its illumination component.
arXiv Detail & Related papers (2024-07-17T11:51:52Z) - ESTISR: Adapting Efficient Scene Text Image Super-resolution for
Real-Scenes [25.04435367653037]
Scene text image super-resolution (STISR) has yielded remarkable improvements in accurately recognizing scene text.
We propose a novel Efficient Scene Text Image Super-resolution (ESTISR) Network for resource-limited deployment platform.
ESTISR consistently outperforms current methods in terms of actual running time and peak memory consumption.
arXiv Detail & Related papers (2023-06-04T19:14:44Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text
Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework.
It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency.
The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z) - SDWNet: A Straight Dilated Network with Wavelet Transformation for Image
Deblurring [23.86692375792203]
Image deblurring is a computer vision problem that aims to recover a sharp image from a blurred image.
Our model uses dilated convolution to enable the obtainment of the large receptive field with high spatial resolution.
We propose a novel module using the wavelet transform, which effectively helps the network to recover clear high-frequency texture details.
arXiv Detail & Related papers (2021-10-12T07:58:10Z) - Spatially-Adaptive Image Restoration using Distortion-Guided Networks [51.89245800461537]
We present a learning-based solution for restoring images suffering from spatially-varying degradations.
We propose SPAIR, a network design that harnesses distortion-localization information and dynamically adjusts to difficult regions in the image.
arXiv Detail & Related papers (2021-08-19T11:02:25Z) - Online Exemplar Fine-Tuning for Image-to-Image Translation [32.556050882376965]
Existing techniques to solve exemplar-based image-to-image translation within deep convolutional neural networks (CNNs) generally require a training phase to optimize the network parameters.
We propose a novel framework, for the first time, to solve exemplar-based translation through an online optimization given an input image pair.
Our framework does not require the off-line training phase, which has been the main challenge of existing methods, but the pre-trained networks to enable optimization in online.
arXiv Detail & Related papers (2020-11-18T15:13:16Z) - Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs)
In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR)
Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv Detail & Related papers (2020-04-08T10:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.