Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision
- URL: http://arxiv.org/abs/2110.02903v1
- Date: Wed, 6 Oct 2021 16:31:20 GMT
- Title: Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision
- Authors: Ruijie Ren, Mohit Gurnani Rajesh, Jordi Sanchez-Riera, Fan Zhang,
Yurun Tian, Antonio Agudo, Yiannis Demiris, Krystian Mikolajczyk and Francesc
Moreno-Noguer
- Abstract summary: This paper tackles the problem of fine-grained region detection in deformed clothes using only a depth image.
We define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points.
We introduce a U-net based network to segment and label these parts.
We show that training our network solely with synthetic data and the proposed DA yields results competitive with models trained on real data.
- Score: 66.56535902642085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically detecting graspable regions from a single depth image is a key
ingredient in cloth manipulation. The large variability of cloth deformations
has motivated most of the current approaches to focus on identifying specific
grasping points rather than semantic parts, as the appearance and depth
variations of local regions are smaller and easier to model than the larger
ones. However, tasks like cloth folding or assisted dressing require
recognising larger segments, such as semantic edges that carry more information
than points. The first goal of this paper is therefore to tackle the problem of
fine-grained region detection in deformed clothes using only a depth image. As
a proof of concept, we implement an approach for T-shirts, and define up to 6
semantic regions of varying extent, including edges on the neckline, sleeve
cuffs, and hem, plus top and bottom grasping points. We introduce a U-net based
network to segment and label these parts. The second contribution of our work
is concerned with the level of supervision that we require to train the
proposed network. While most approaches learn to detect grasping points by
combining real and synthetic annotations, in this work we defy the limitations
of the synthetic data, and propose a multilayered domain adaptation (DA)
strategy that does not use real annotations at all. We thoroughly evaluate our
approach on real depth images of a T-shirt annotated with fine-grained labels.
We show that training our network solely with synthetic data and the proposed
DA yields results competitive with models trained on real data.
Related papers
- 3D Medical Image Segmentation with Sparse Annotation via Cross-Teaching
between 3D and 2D Networks [26.29122638813974]
We propose a framework that can robustly learn from sparse annotation using the cross-teaching of both 3D and 2D networks.
Our experimental results on the MMWHS dataset demonstrate that our method outperforms the state-of-the-art (SOTA) semi-supervised segmentation methods.
arXiv Detail & Related papers (2023-07-30T15:26:17Z) - ReFit: A Framework for Refinement of Weakly Supervised Semantic
Segmentation using Object Border Fitting for Medical Images [4.945138408504987]
Weakly Supervised Semantic (WSSS) relying only on image-level supervision is a promising approach to deal with the need for networks.
We propose our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques.
By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging.
arXiv Detail & Related papers (2023-03-14T12:46:52Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - Semi-supervised domain adaptation with CycleGAN guided by a downstream
task loss [4.941630596191806]
Domain adaptation is of huge interest as labeling is an expensive and error-prone task.
Image-to-image approaches can be used to mitigate the shift in the input.
We propose a "task aware" version of a GAN in an image-to-image domain adaptation approach.
arXiv Detail & Related papers (2022-08-18T13:13:30Z) - Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with
Self-Supervised Depth Estimation [94.16816278191477]
We present a framework for semi-adaptive and domain-supervised semantic segmentation.
It is enhanced by self-supervised monocular depth estimation trained only on unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset.
arXiv Detail & Related papers (2021-08-28T01:33:38Z) - Every Annotation Counts: Multi-label Deep Supervision for Medical Image
Segmentation [85.0078917060652]
We propose a semi-weakly supervised segmentation algorithm to overcome this barrier.
Our approach is based on a new formulation of deep supervision and student-teacher model.
With our novel training regime for segmentation that flexibly makes use of images that are either fully labeled, marked with bounding boxes, just global labels, or not at all, we are able to cut the requirement for expensive labels by 94.22%.
arXiv Detail & Related papers (2021-04-27T14:51:19Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Context-aware Attentional Pooling (CAP) for Fine-grained Visual
Classification [2.963101656293054]
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition.
We propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients.
We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets.
arXiv Detail & Related papers (2021-01-17T10:15:02Z) - Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious.
The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving.
The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z) - Manifold-driven Attention Maps for Weakly Supervised Segmentation [9.289524646688244]
We propose a manifold driven attention-based network to enhance visual salient regions.
Our method generates superior attention maps directly during inference without the need of extra computations.
arXiv Detail & Related papers (2020-04-07T00:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.