Multi-granularity Interaction Simulation for Unsupervised Interactive
Segmentation
- URL: http://arxiv.org/abs/2303.13399v1
- Date: Thu, 23 Mar 2023 16:19:43 GMT
- Title: Multi-granularity Interaction Simulation for Unsupervised Interactive
Segmentation
- Authors: Kehan Li, Yian Zhao, Zhennan Wang, Zesen Cheng, Peng Jin, Xiangyang
Ji, Li Yuan, Chang Liu, Jie Chen
- Abstract summary: We introduce a Multi-granularity Interaction Simulation (MIS) approach to open up a promising direction for unsupervised interactive segmentation.
Our MIS significantly outperforms non-deep learning unsupervised methods and is even comparable with some previous deep-supervised methods without any annotation.
- Score: 38.08152990071453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive segmentation enables users to segment as needed by providing cues
of objects, which introduces human-computer interaction for many fields, such
as image editing and medical image analysis. Typically, massive and expansive
pixel-level annotations are spent to train deep models by object-oriented
interactions with manually labeled object masks. In this work, we reveal that
informative interactions can be made by simulation with semantic-consistent yet
diverse region exploration in an unsupervised paradigm. Concretely, we
introduce a Multi-granularity Interaction Simulation (MIS) approach to open up
a promising direction for unsupervised interactive segmentation. Drawing on the
high-quality dense features produced by recent self-supervised models, we
propose to gradually merge patches or regions with similar features to form
more extensive regions and thus, every merged region serves as a
semantic-meaningful multi-granularity proposal. By randomly sampling these
proposals and simulating possible interactions based on them, we provide
meaningful interaction at multiple granularities to teach the model to
understand interactions. Our MIS significantly outperforms non-deep learning
unsupervised methods and is even comparable with some previous deep-supervised
methods without any annotation.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models [14.130327598928778]
Large language models (LLMs) and hierarchical motion-specific vector-quantized variational autoencoders (VQ-VAEs) are proposed.
Our framework generates realistic and diverse collaborative human-object-human interactions, outperforming state-of-the-art methods.
Our work opens up new possibilities for modeling complex interactions in various domains, such as robotics, graphics and computer vision.
arXiv Detail & Related papers (2024-09-30T17:02:13Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Improving Anomaly Segmentation with Multi-Granularity Cross-Domain
Alignment [17.086123737443714]
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.
While existing methods demonstrate noteworthy results on synthetic data, they often fail to consider the disparity between synthetic and real-world data domains.
We introduce the Multi-Granularity Cross-Domain Alignment framework, tailored to harmonize features across domains at both the scene and individual sample levels.
arXiv Detail & Related papers (2023-08-16T22:54:49Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Boundary-aware Supervoxel-level Iteratively Refined Interactive 3D Image
Segmentation with Multi-agent Reinforcement Learning [33.181732857907384]
We propose to model interactive image segmentation with a Markov decision process (MDP) and solve it with reinforcement learning (RL)
Considering the large exploration space for voxel-wise prediction, multi-agent reinforcement learning is adopted, where the voxel-level policy is shared among agents.
Experimental results on four benchmark datasets have shown that the proposed method significantly outperforms the state-of-the-arts.
arXiv Detail & Related papers (2023-03-19T15:52:56Z) - A Variational Information Bottleneck Approach to Multi-Omics Data
Integration [98.6475134630792]
We propose a deep variational information bottleneck (IB) approach for incomplete multi-view observations.
Our method applies the IB framework on marginal and joint representations of the observed views to focus on intra-view and inter-view interactions that are relevant for the target.
Experiments on real-world datasets show that our method consistently achieves gain from data integration and outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-02-05T06:05:39Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.