Unsupervised Segmentation in Real-World Images via Spelke Object
Inference
- URL: http://arxiv.org/abs/2205.08515v1
- Date: Tue, 17 May 2022 17:39:24 GMT
- Title: Unsupervised Segmentation in Real-World Images via Spelke Object
Inference
- Authors: Honglin Chen, Rahul Venkatesh, Yoni Friedman, Jiajun Wu, Joshua B.
Tenenbaum, Daniel L. K. Yamins, Daniel M. Bear
- Abstract summary: Excitatory-Inhibitory Segment Extraction Network (EISEN) learns from optical flow estimates to extract pairwise affinity graphs for static scenes.
We show that EISEN achieves a substantial improvement in the state of the art for self-supervised segmentation on challenging synthetic and real-world robotic image datasets.
- Score: 44.79376336842088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised category-agnostic segmentation of real-world images into
objects is a challenging open problem in computer vision. Here, we show how to
learn static grouping priors from motion self-supervision, building on the
cognitive science notion of Spelke Objects: groupings of stuff that move
together. We introduce Excitatory-Inhibitory Segment Extraction Network
(EISEN), which learns from optical flow estimates to extract pairwise affinity
graphs for static scenes. EISEN then produces segments from affinities using a
novel graph propagation and competition mechanism. Correlations between
independent sources of motion (e.g. robot arms) and objects they move are
resolved into separate segments through a bootstrapping training process. We
show that EISEN achieves a substantial improvement in the state of the art for
self-supervised segmentation on challenging synthetic and real-world robotic
image datasets. We also present an ablation analysis illustrating the
importance of each element of the EISEN architecture.
Related papers
- SOHES: Self-supervised Open-world Hierarchical Entity Segmentation [82.45303116125021]
This work presents Self-supervised Open-world Hierarchical Entities (SOHES), a novel approach that eliminates the need for human annotations.
We produce abundant high-quality pseudo-labels through visual feature clustering, and rectify the noises in pseudo-labels via a teacher- mutual-learning procedure.
Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation.
arXiv Detail & Related papers (2024-04-18T17:59:46Z) - EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation [5.476136494434766]
We introduce EiCue, a technique providing semantic and structural cues through an eigenbasis derived from semantic similarity matrix.
We guide our model to learn object-level representations with intra- and inter-image object-feature consistency.
Experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results.
arXiv Detail & Related papers (2024-03-03T11:24:16Z) - Self-trained Panoptic Segmentation [0.0]
Panoptic segmentation is an important computer vision task which combines semantic and instance segmentation.
Recent advancements in self-supervised learning approaches have shown great potential in leveraging synthetic and unlabelled data to generate pseudo-labels.
The aim of this work is to develop a framework to perform embedding-based self-supervised panoptic segmentation using self-training in a synthetic-to-real domain adaptation problem setting.
arXiv Detail & Related papers (2023-11-17T17:06:59Z) - LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and
Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks.
We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z) - Optical Flow boosts Unsupervised Localization and Segmentation [22.625511865323183]
We propose a new loss term formulation that uses optical flow in unlabeled videos to encourage self-supervised ViT features to become closer to each other.
We use the proposed loss function to finetune vision transformers that were originally trained on static images.
arXiv Detail & Related papers (2023-07-25T16:45:35Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues.
We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background.
We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.