Related papers: Object-Based Image Coding: A Learning-Driven Revisit

Object-Based Image Coding: A Learning-Driven Revisit

URL: http://arxiv.org/abs/2003.08033v1
Date: Wed, 18 Mar 2020 04:00:17 GMT
Title: Object-Based Image Coding: A Learning-Driven Revisit
Authors: Qi Xia, Haojie Liu and Zhan Ma
Abstract summary: A fundamental issue behind is how to efficiently process the arbitrary-shaped objects at a fine granularity. We have proposed an object segmentation network for image layer decomposition, and parallel convolution-based neural image compression networks to process masked foreground objects and background scene separately. All components are optimized in an end-to-end learning framework to intelligently weigh their contributions for visually pleasant reconstruction.
Score: 30.550019759674477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Object-Based Image Coding (OBIC) that was extensively studied about two decades ago, promised a vast application perspective for both ultra-low bitrate communication and high-level semantical content understanding, but it had rarely been used due to the inefficient compact representation of object with arbitrary shape. A fundamental issue behind is how to efficiently process the arbitrary-shaped objects at a fine granularity (e.g., feature element or pixel wise). To attack this, we have proposed to apply the element-wise masking and compression by devising an object segmentation network for image layer decomposition, and parallel convolution-based neural image compression networks to process masked foreground objects and background scene separately. All components are optimized in an end-to-end learning framework to intelligently weigh their (e.g., object and background) contributions for visually pleasant reconstruction. We have conducted comprehensive experiments to evaluate the performance on PASCAL VOC dataset at a very low bitrate scenario (e.g., $\lesssim$0.1 bits per pixel - bpp) which have demonstrated noticeable subjective quality improvement compared with JPEG2K, HEVC-based BPG and another learned image compression method. All relevant materials are made publicly accessible at https://njuvision.github.io/Neural-Object-Coding/.

Related papers

EOPose : Exemplar-based object reposing using Generalized Pose Correspondences [16.104124493724274]
We propose an end-to-end framework for generic object reposing.<n>Our method, EOPose, takes a target pose-guidance image as input and uses its keypoint correspondence with the source object image to warp and re-render the latter into the target pose.<n>Unlike generative approaches, our method also preserves the fine-grained details of the object such as its exact colors, textures and brand marks.
arXiv Detail & Related papers (2025-05-06T10:17:32Z)
PixelHacker: Image Inpainting with Structural and Semantic Consistency [28.984953143157107]
Inpainting is a fundamental research area between image editing and image generation. Recent state-of-the-art (SOTA) methods have explored novel attention mechanisms, lightweight architectures, and context-aware modeling. We design a simple yet effective inpainting paradigm called latent categories guidance, and propose a diffusion-based model named PixelHacker.
arXiv Detail & Related papers (2025-04-29T05:28:36Z)
Extremely low-bitrate Image Compression Semantically Disentangled by LMMs from a Human Perception Perspective [2.542077227403488]
Inspired by human progressive perception mechanism, we propose a Semantically Disentangled Image Compression framework. We leverage LMMs to extract essential semantic components, including overall descriptions, object detailed description, and semantic segmentation masks. We propose a training-free Object Restoration model with Attention Guidance (ORAG) built on pre-trained ControlNet to restore object details conditioned by object-level text descriptions and semantic masks.
arXiv Detail & Related papers (2025-03-01T08:27:11Z)
Rethinking Image-to-Video Adaptation: An Object-centric Perspective [61.833533295978484]
We propose a novel and efficient image-to-video adaptation strategy from the object-centric perspective. Inspired by human perception, we integrate a proxy task of object discovery into image-to-video transfer learning.
arXiv Detail & Related papers (2024-07-09T13:58:10Z)
Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL) A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z)
Joint Perceptual Learning for Enhancement and Object Detection in Underwater Scenarios [41.34564703212461]
We propose a bilevel optimization formulation for jointly learning underwater object detection and image enhancement. Our method outputs visually favoring images and higher detection accuracy.
arXiv Detail & Related papers (2023-07-07T11:54:06Z)
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture [43.83887661156133]
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images.
arXiv Detail & Related papers (2023-01-19T18:59:01Z)
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding [5.715548995729382]
We propose an effective technique for image augmentation by injecting contextually meaningful knowledge into the scenes. Our method of semantically meaningful image augmentation for object detection via language grounding, SemAug, starts by calculating semantically appropriate new objects.
arXiv Detail & Related papers (2022-08-15T19:00:56Z)
TopicFM: Robust and Interpretable Feature Matching with Topic-assisted [8.314830611853168]
We propose an architecture for image matching which is efficient, robust, and interpretable. We introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic. Our method can only perform matching in co-visibility regions to reduce computations.
arXiv Detail & Related papers (2022-07-01T10:39:14Z)
Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks. After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z)
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description [9.734266860544663]
This paper proposes an expressive yet compact model for joint object pose and shape optimization. It infers an object-level map from multi-view RGB-D camera observations. Our approach is evaluated on the large-scale real-world ScanNet dataset and compared against state-of-the-art methods.
arXiv Detail & Related papers (2021-08-01T03:07:31Z)
Deep ensembles based on Stochastic Activation Selection for Polyp Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations. Basic architecture in image segmentation consists of an encoder and a decoder. We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z)
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning [60.75687261314962]
We introduce pixel-level pretext tasks for learning dense feature representations. A pixel-to-propagation consistency task produces better results than state-of-the-art approaches. Results demonstrate the strong potential of defining pretext tasks at the pixel level.
arXiv Detail & Related papers (2020-11-19T18:59:45Z)
KiU-Net: Overcomplete Convolutional Architectures for Biomedical Image and Volumetric Segmentation [71.79090083883403]
"Traditional" encoder-decoder based approaches perform poorly in detecting smaller structures and are unable to segment boundary regions precisely. We propose KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features. The proposed method achieves a better performance as compared to all the recent methods with an additional benefit of fewer parameters and faster convergence.
arXiv Detail & Related papers (2020-10-04T19:23:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.