Leveraging Color Channel Independence for Improved Unsupervised Object Detection
- URL: http://arxiv.org/abs/2412.15150v1
- Date: Thu, 19 Dec 2024 18:28:37 GMT
- Title: Leveraging Color Channel Independence for Improved Unsupervised Object Detection
- Authors: Bastian Jäckl, Yannick Metz, Udo Schlegel, Daniel A. Keim, Maximilian T. Fischer,
- Abstract summary: We challenge the common assumption that RGB images are the optimal color space for unsupervised learning in computer vision.
We show that models improve when requiring them to predict additional color channels.
The use of composite color spaces can be implemented with basically no computational overhead.
- Score: 7.030688465389997
- License:
- Abstract: Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encoded by RGB color spaces. In our work, we challenge the common assumption that RGB images are the optimal color space for unsupervised learning in computer vision. We discuss conceptually and empirically that other color spaces, such as HSV, bear essential characteristics for object-centric representation learning, like robustness to lighting conditions. We further show that models improve when requiring them to predict additional color channels. Specifically, we propose to transform the predicted targets to the RGB-S space, which extends RGB with HSV's saturation component and leads to markedly better reconstruction and disentanglement for five common evaluation datasets. The use of composite color spaces can be implemented with basically no computational overhead, is agnostic of the models' architecture, and is universally applicable across a wide range of visual computing tasks and training types. The findings of our approach encourage additional investigations in computer vision tasks beyond object-centric learning.
Related papers
- Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images [7.932206255996779]
The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline.
Images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible.
This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches.
arXiv Detail & Related papers (2024-05-20T17:20:41Z) - Rethinking RGB Color Representation for Image Restoration Models [55.81013540537963]
We augment the representation to hold structural information of local neighborhoods at each pixel.
Substituting the underlying representation space for the per-pixel losses facilitates the training of image restoration models.
Our space consistently improves overall metrics by reconstructing both color and local structures.
arXiv Detail & Related papers (2024-02-05T06:38:39Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - ColorSense: A Study on Color Vision in Machine Visual Recognition [57.916512479603064]
We collect 110,000 non-trivial human annotations of foreground and background color labels from visual recognition benchmarks.
We validate the use of our datasets by demonstrating that the level of color discrimination has a dominating effect on the performance of machine perception models.
Our findings suggest that object recognition tasks such as classification and localization are susceptible to color vision bias.
arXiv Detail & Related papers (2022-12-16T18:51:41Z) - Scale Invariant Semantic Segmentation with RGB-D Fusion [12.650574326251023]
We propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images.
We incorporate depth information to the RGB data for pixel-wise semantic segmentation to address the different scale objects in an outdoor scene.
Our model is compact and can be easily applied to the other RGB model.
arXiv Detail & Related papers (2022-04-10T12:54:27Z) - Colored Point Cloud to Image Alignment [15.828285556159026]
We introduce a differential optimization method that aligns a colored point cloud to a given color image via iterative geometric and color matching.
We find the transformation between the camera image and the point cloud colors by iterating between matching the relative location of the point cloud and matching colors.
arXiv Detail & Related papers (2021-10-07T08:12:56Z) - Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB
Images in the Wild [48.44194221801609]
We propose a new lightweight and end-to-end learning-based framework to tackle this challenge.
We progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective camera spectral response function estimation.
Our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings.
arXiv Detail & Related papers (2021-08-15T05:19:44Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via
Geometry-Aware Adversarial Learning [9.150245363036165]
Dynamic objects have a significant impact on the robot's perception of the environment.
In this work, we address this problem by synthesizing plausible color, texture and geometry in regions occluded by dynamic objects.
We optimize our architecture using adversarial training to synthesize fine realistic textures which enables it to hallucinate color and depth structure in occluded regions online.
arXiv Detail & Related papers (2020-08-12T01:23:21Z) - Learning RGB-D Feature Embeddings for Unseen Object Instance
Segmentation [67.88276573341734]
We propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.
A metric learning loss function is utilized to learn to produce pixel-wise feature embeddings.
We further improve the segmentation accuracy with a new two-stage clustering algorithm.
arXiv Detail & Related papers (2020-07-30T00:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.