Object-Based Image Coding: A Learning-Driven Revisit
- URL: http://arxiv.org/abs/2003.08033v1
- Date: Wed, 18 Mar 2020 04:00:17 GMT
- Title: Object-Based Image Coding: A Learning-Driven Revisit
- Authors: Qi Xia, Haojie Liu and Zhan Ma
- Abstract summary: A fundamental issue behind is how to efficiently process the arbitrary-shaped objects at a fine granularity.
We have proposed an object segmentation network for image layer decomposition, and parallel convolution-based neural image compression networks to process masked foreground objects and background scene separately.
All components are optimized in an end-to-end learning framework to intelligently weigh their contributions for visually pleasant reconstruction.
- Score: 30.550019759674477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Object-Based Image Coding (OBIC) that was extensively studied about two
decades ago, promised a vast application perspective for both ultra-low bitrate
communication and high-level semantical content understanding, but it had
rarely been used due to the inefficient compact representation of object with
arbitrary shape. A fundamental issue behind is how to efficiently process the
arbitrary-shaped objects at a fine granularity (e.g., feature element or pixel
wise). To attack this, we have proposed to apply the element-wise masking and
compression by devising an object segmentation network for image layer
decomposition, and parallel convolution-based neural image compression networks
to process masked foreground objects and background scene separately. All
components are optimized in an end-to-end learning framework to intelligently
weigh their (e.g., object and background) contributions for visually pleasant
reconstruction. We have conducted comprehensive experiments to evaluate the
performance on PASCAL VOC dataset at a very low bitrate scenario (e.g.,
$\lesssim$0.1 bits per pixel - bpp) which have demonstrated noticeable
subjective quality improvement compared with JPEG2K, HEVC-based BPG and another
learned image compression method. All relevant materials are made publicly
accessible at https://njuvision.github.io/Neural-Object-Coding/.
Related papers
- PixelHacker: Image Inpainting with Structural and Semantic Consistency [28.984953143157107]
Inpainting is a fundamental research area between image editing and image generation.
Recent state-of-the-art (SOTA) methods have explored novel attention mechanisms, lightweight architectures, and context-aware modeling.
We design a simple yet effective inpainting paradigm called latent categories guidance, and propose a diffusion-based model named PixelHacker.
arXiv Detail & Related papers (2025-04-29T05:28:36Z) - Extremely low-bitrate Image Compression Semantically Disentangled by LMMs from a Human Perception Perspective [2.542077227403488]
Inspired by human progressive perception mechanism, we propose a Semantically Disentangled Image Compression framework.
We leverage LMMs to extract essential semantic components, including overall descriptions, object detailed description, and semantic segmentation masks.
We propose a training-free Object Restoration model with Attention Guidance (ORAG) built on pre-trained ControlNet to restore object details conditioned by object-level text descriptions and semantic masks.
arXiv Detail & Related papers (2025-03-01T08:27:11Z) - Rethinking Image-to-Video Adaptation: An Object-centric Perspective [61.833533295978484]
We propose a novel and efficient image-to-video adaptation strategy from the object-centric perspective.
Inspired by human perception, we integrate a proxy task of object discovery into image-to-video transfer learning.
arXiv Detail & Related papers (2024-07-09T13:58:10Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Joint Perceptual Learning for Enhancement and Object Detection in
Underwater Scenarios [41.34564703212461]
We propose a bilevel optimization formulation for jointly learning underwater object detection and image enhancement.
Our method outputs visually favoring images and higher detection accuracy.
arXiv Detail & Related papers (2023-07-07T11:54:06Z) - Self-Supervised Learning from Images with a Joint-Embedding Predictive
Architecture [43.83887661156133]
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations.
We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images.
arXiv Detail & Related papers (2023-01-19T18:59:01Z) - SemAug: Semantically Meaningful Image Augmentations for Object Detection
Through Language Grounding [5.715548995729382]
We propose an effective technique for image augmentation by injecting contextually meaningful knowledge into the scenes.
Our method of semantically meaningful image augmentation for object detection via language grounding, SemAug, starts by calculating semantically appropriate new objects.
arXiv Detail & Related papers (2022-08-15T19:00:56Z) - TopicFM: Robust and Interpretable Feature Matching with Topic-assisted [8.314830611853168]
We propose an architecture for image matching which is efficient, robust, and interpretable.
We introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic.
Our method can only perform matching in co-visibility regions to reduce computations.
arXiv Detail & Related papers (2022-07-01T10:39:14Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level
Ellipsoid and Signed Distance Function Description [9.734266860544663]
This paper proposes an expressive yet compact model for joint object pose and shape optimization.
It infers an object-level map from multi-view RGB-D camera observations.
Our approach is evaluated on the large-scale real-world ScanNet dataset and compared against state-of-the-art methods.
arXiv Detail & Related papers (2021-08-01T03:07:31Z) - Deep ensembles based on Stochastic Activation Selection for Polyp
Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations.
Basic architecture in image segmentation consists of an encoder and a decoder.
We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z) - Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised
Visual Representation Learning [60.75687261314962]
We introduce pixel-level pretext tasks for learning dense feature representations.
A pixel-to-propagation consistency task produces better results than state-of-the-art approaches.
Results demonstrate the strong potential of defining pretext tasks at the pixel level.
arXiv Detail & Related papers (2020-11-19T18:59:45Z) - KiU-Net: Overcomplete Convolutional Architectures for Biomedical Image
and Volumetric Segmentation [71.79090083883403]
"Traditional" encoder-decoder based approaches perform poorly in detecting smaller structures and are unable to segment boundary regions precisely.
We propose KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features.
The proposed method achieves a better performance as compared to all the recent methods with an additional benefit of fewer parameters and faster convergence.
arXiv Detail & Related papers (2020-10-04T19:23:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.